We provide two ways to extract data from a source table and add them to another table. We either use SSIS packages or ADO.Net.
Requirements to run ADO.Net
Most of the requirements is added here. If you have a local SQL server installation, you need SQL Express as a minimum, though you will need a Standard edition if you want to use Analysis services. Is it externally located, you will need CLR, SMO and possibly these.
Requirements to run SSIS
As before most requirements are added here. No matter if you have the SQL server locally, or on another server, it requires at least SQL Standard edition, though for all features SQL Enterprise edition. Here is a list of features and for what edition.
Is it located on another server you will need the remote SSIS execution feature as well. Discovery Hub Remote SSIS Execution package
Features of ADO.Net
- There is no deployment needed on the transfer task
- You can deploy without access to the data source (But not execute)
- You can deploy and execute to Azure databases.
Features of SSIS
- Microsoft best practice
- It has better performance for large tables with many columns and rows (faster and with better memory management)
- Checks the source connection / table / columns on deployment (Can not deploy without access to the source)
Since ADO doesn't need to deploy any transfer package, i will only focus on execution.
I have four different tests.
- I have a large project based on the Discovery Hub model, you can download from the cube store. It contains 492 tables with 271 tables in the Business Unit, 146 tables in the DSA and 75 tables in the MDW. Many tables run incremental load, therefore i made two execution packages. One is a full load package that fully executes Business Unit, DSA and MDW. The other does not fully execute the tables, but are otherwise similar.
- I have a smaller project based on a two tier architecture with a DSA/Staging table and a DWH. It contains 73 tables with 47 in the DSA and 26 in the DWH. Similar to the previous test i have two execution packages set up as DSA, DWH. One runs it full load and one doesn't.
- I execute a single table. It contains about 500000 rows and i only focus on the data transfer time.
- I execute a single table. It contains about 500000 rows and every time i execute it, i increase the amount of rows with 500000. Here i do not measure the difference in time but the average increase in time in each iteration.
I run all tests eight times in a row. the first two split between Full and Std and the rest just fully loaded.
|Discovery Hub average||573,9||0:09:34||456,0||0:07:36||117,9|
|Single table average||12,4||0:00:12||18,4||0:00:18||-6,0|
|Single table double lines||19,6||0:00:20||24,8||0:00:25||-5,2|
So as you can see SSIS is faster executing a single table and a small project, but only by less than 5 seconds on average. ADO.net is faster at running a large project.
Both projects could be executed way faster if you used Direct Read Same server and database on DSA and MDW where Business Unit, DSA and MDW using the same database.
Good article. This was helpful. We switched to ADO.Net because we were seeing better Data Warehouse to Data Warehouse performance for large tables. It seems to be working well in testing.
Just in case any one else runs into this, we did see one difference, that was not so obvious, on reconciling our Data Warehouse after an upgrade and making this change.
We noticed that one of our facts was very slightly off and after researching determined that when moving a DECIMAL(38,6) into a DECIMAL(38,4), it, appears anyway, that SSIS floors (or truncates) and ADO.NET rounds. Granted the data types should have been the same in both databases, and would have acted the same using both methods, but this highlighted that unexpected conversion differences could happen if data type differences exist in the data flow.