We have a new community! Please visit support.timextender.com
Follow

Data Warehouses

Note: There is an updated version of this article, which applies to the next generation of TimeXtender.

Data Warehouses

A data warehouse in TimeXtender is a SQL Server database on-premise or in Azure where your data is stored for queries and analysis. Often, a TimeXtender project consists of one data warehouse where you consolidate data from one or more staging databases and a number of data sources.

During the execution of a project, TimeXtender extracts data from the staging database or ODX data storage and transfers it to the data warehouse. Initially, the data resides in what is known as raw table instances in the data warehouse. TimeXtender applies data transformations and data cleansing rules and saves the resulting data in the valid instances of the tables, ready for queries and analysis.

Adding A Data Warehouse

  1. In the Solution Explorer, right-click Data Warehouses, and click Add Data Warehouse.
  2. In the Name box, type a name for the data warehouse. The name cannot exceed 15 characters in length.
  3. In the Server name box, type the name of the server that you want to store the database on. If it is a named instance, type the server name and the instance name. Click the ellipsis (...) to choose one of the available servers in your local Active Directory, if any.
  4. In the Authentication list, click the mode of authentication you want to use. You have the following options:
    • Windows Authentication: Use the logged-in Windows user's credentials for authentication.
    • SQL Server Authentication: Use a login set up on the SQL Server. Enter the username and password in the corresponding fields.
    • Azure AD Password Authentication: Use Azure AD credentials from a domain that is not federated with Azure AD. Enter the username and password in the corresponding fields.
    • Azure AD Integrated Authentication: Use the logged-in Windows user's credentials for authentication, provided that he is logged in using Azure AD credentials from a domain that is federated with Azure AD.
  5. In the Database box, select an existing database, or type the name of a new database, and then click Create.
  6. (Optional) In the Collation list, click on the collation you want to use. You can choose a specific collation or one of the following:
    • <Application Default>: Uses the application default

      If you refresh the list and hovers over the I icon again it will look like this.
    • <Server Default>: Inherits the collation from the specified server.
    It is best practice to use the same collation for the staging database, data warehouse database, and any SSAS Multidimensional databases.
  7. (Optional) In the Connection timeout box, enter the number of seconds to wait before terminating the attempt to connect to the server. Set it to 0 to wait indefinitely.
  8. (Optional) In the Command timeout box, enter the number of seconds to wait before terminating a command.
  9. In the Max. rows to copy box, enter the batch size when using ADO.net transfer. '0' equals unlimited.
  10. (Optional) In the Encrypt connection list, you can enable encryption of the connection, which is recommended when you are not in a private network (e.g. when your server is on Azure). You have the following options:
    • No: The communication is not encrypted (default).
    • Yes: The communication is encrypted. The server's certificate is verified by a certificate authority.
    • Yes, trust server certificate: The communication is encrypted. but the server's certificate is not verified. This setting is not recommended for use on public networks.
  11. (Optional) In the Direct read list, you can enable direct read from the staging database(s) to the data warehouse database. With direct read enabled, data is transferred using a stored procedure containing a simple SELECT statement. This can, especially if TimeXtender is not on the same machine as the SQL Server, give better performance than SSIS or ADO.net since transfers using these technologies happen via TimeXtender. For direct read to work, some prerequisites must be met: On SQL Server, the databases need to be on the same server. On Azure SQL Database, the staging and data warehouse databases need to be in the same database. You have the following options for direct read:
    • Disabled
    • Matching Server: Direct read is used if the data warehouse and staging database server names match.
    • Matching Server and Database: Direct read is used if the data warehouse and staging database server names and database names match.
  12. In the Use SSIS for transfer list click on Yes to enable SQL Server Integration Services data transfer, No to disable it, or leave it at As Parent to respect the project setting.
  13. (Optional) Click SSIS Settings to configure SSIS.
    • In the Local SSIS version list, click on the SSIS version that is installed on the local machine. Automatic (default) will use the SSIS version corresponding to the data warehouse server and try to connect to the Package SQL Server through the data warehouse SQL Server. In this case, you do not need to enter authentication information.
    • Enter a server name in the Package SQL Server box. This server is used for storing the SSIS packages generated by TimeXtender. If you do not enter a value in this field, the packages are stored on the database server.
    • In the Authentication list, click on the type of authentication you want to use. Enter a Username and Password if required by the authentication type you have selected.
    • Select Enable remote SSIS execution to enable the execution of SSIS packages on a remote server and enter the details for the remote server below. Note that you will need to install the Remote SSIS Execution service on the remote server. It can be downloaded from the support site, https://support.timextender.com.
  14. In the Simple Mode list, click Enabled if you want to set all tables in the data warehouse to Simple Mode. See Simple Mode for more information.
  15. Click Test Connection to verify that the connection is working.
  16. Click Advanced Settings to access the advanced settings for the data warehouse. These are all optional.
  17. If you want to add additional connection strings, enter them in the Connection String Properties box.
  18. If you are deploying your data warehouse on Azure Synapse Analytics (formerly Azure SQL Data Warehouse), click on SQL Data Warehouse in the Deployment target list. For other versions of SQL Server, the version and edition can usually be auto-detected, and the setting can be left at its default. However, you can use the option to set the deployment target explicitly if you experience issues. Note: If you use Azure Synapse Analytics, some options and settings are not available due to the differences between this and other flavors of SQL Server. However, additional table settings are available to control Azure Synapse Analytics-specific options. The differences are noted when applicable throughout the user guide.
  19. See Adding a Database Role for an explanation of the Drop role options setting.
  20. Under Transfer from ODX, select the method you want to use. The following settings are available:
    • ADO.net: The application-native method.
    • Azure Data Factory: Uses Azure Data Factory, which can be faster if the ODX data storage and data warehouse are both on Azure.

Cleaning up the Database

To prevent accidental data loss, deleting a table in TimeXtender does not delete the physical table in the data warehouse or staging database or, if you use SSIS transfer, the SSIS package that was created to copy data. The downside is that tables deleted in TimeXtender still take up space in the database.

Identifying and Deleting Unused Tables

The SQL Database Cleanup Tool enables you to identify tables left behind by TimeXtender and delete - drop - them to free up space. Note that database schemas are not deleted from the database. You will need to drop those manually in SQL Server after deleting them in TimeXtender.

Warning: When you drop a table with the SQL Database Cleanup Tool, it is permanently deleted. Take care when you use the tool.

To clean up your data warehouse or staging database, follow the steps below.

  1. Right-click a data warehouse or staging database, click Advanced, and click SQL Database Cleanup Tool. TimeXtender will read the objects from the database and open the SQL Database Cleanup Tool window.
  2. The objects in the database that are no longer, or never were, part of the currently opened project are listed.
  3. (Optional) Right-click a table, view, procedure, or function and click Script to display the SQL script behind the object.
  4. Right-click a table, view, procedure, or table, and click Drop to drop the object from the database.
    1. If the item does not have subordinate items, click Yes when TimeXtender asks you to confirm the drop.
    2. If the item has subordinate items, a window will open with a list of the objects that will be dropped. Clear the selection for any tables you want to keep and then click Drop.
      Note: TimeXtender will automatically clear the selection for any incrementally loaded tables to prevent accidental data loss. TimeXtender will ask you to confirm if you want to drop an incrementally loaded table.
  5. When you have dropped all the objects you want to delete from the database, close the window.

Identifying and Deleting Unused SSIS Packages

With the SSIS Packages Cleanup Tool, you can delete the SSIS packages that are left when you have deleted a table, stopped using SSIS for transfer, or made other changes that make some packages unnecessary.

To clean up SSIS packages, follow the steps below.

  1. In the Solution Explorer, right-click your project, click Advanced, and click SSIS Packages Cleanup Tool. A list of unused packages is displayed.
  2. Select the packages you know are safe to delete, and click Delete.

Warning: When you delete a package with the SSIS Packages Cleanup Tool, it is permanently deleted from the server. Take care when you use the tool.

Was this article helpful?
0 out of 0 found this helpful

1 Comments

  • 0
    Avatar
    Rasmus Remmer Bielidt

    Regarding #11 - Direct Read option

    This is a great option and works well out of the box as long as you don't use incremental load tables between the stage and warehouse.

    If you are using incremental loads, this will actually make the performance worse as it results in a clustered index scan on the staging table instead of a non-clustered index seek and key-lookup due to the incremental load variable (@fullLoadIncrementalRule) being set during the execution of the stored procedure and not input as parameter. This leads the query optimizer to expect 1/3 of the rows to be returned since it does not know the value of the variable at compile time, which performs better using a clustered index scan rather than a non-clustered index seek + key-lookup. In my use cases, this is always wildly inaccurate as only a tiny fraction of the rows in the stage are usually required to extract.

    The fix, however, is straightforward, add an "OPTION (RECOMPILE)" to the statement in order to make the query optimizer recompile the query after the variable has been set.

    I run this script after each deployment which will add the OPTION (RECOMPILE) hint to all %usp_CrDM% procedures containing the (@fullLoadIncrementalRule) variable - which are the ones performing poorly.

    use [warehouse_db_name];

    /*
    ensure that all the direct read on incremental tables contain "OPTION(RECOMPILE)" at the end
    */

    declare @optionrecompile nvarchar(18) = nchar(10)+N'OPTION(RECOMPILE)'
    ,@alter_procedure_statement nvarchar(max)

    declare csr CURSOR FOR
    SELECT STUFF(OBJECT_DEFINITION(sp.object_id)+@optionrecompile,1,6,N'ALTER')
    FROM sys.procedures sp
    where name like '%usp_CrDM%'
    and type=N'P'
    and OBJECT_DEFINITION(sp.object_id) like '%@fullLoadIncrementalRule%'
    AND RIGHT(OBJECT_DEFINITION(sp.object_id),18) <> @optionrecompile

    open csr
    fetch next from csr into @alter_procedure_statement
    while @@FETCH_STATUS=0
    begin
    exec(@alter_procedure_statement)
    print 'Alter statement processed: '+nchar(10)+nchar(13)+@alter_procedure_statement+nchar(10)+nchar(13)
    fetch next from csr into @alter_procedure_statement
    end

    close csr
    deallocate csr


     
     
    Edited by Rasmus Remmer Bielidt
Please sign in to leave a comment.