Selecting, Validating and Transforming Data
Selecting the right data from the source, validating it and transforming the data if needed are central parts of the data warehouse process.
In TimeXtender you specify data selection rules to ensure that only the data needed for your analysis is extracted from the data source to the staging database.
On the staging database, you perform data cleansing by applying validation and transformation rules to the data. This ensures that only valid data is loaded into the data warehouse.
However, you can also apply selection, validation and transformation rules on a data warehouse. This is useful when you have moved data from different business units into the data warehouse and want to ensure the validity of the consolidated data.
Operators for Selecting and Validating Data
When defining a data selection or validation rule, you can use the operators listed below.
Values must be either integers or letters. You can also specify a list of values by entering comma-separated values.
Operator |
Definition |
---|---|
Not Empty |
Selects records where the value of a field is not empty or NULL |
Equal |
Selects records where the value of a field is equal to the specified value |
Greater Than |
Selects records where the value of a field is greater than the specified value |
Less Than |
Selects records where the value of a field is less than the specified value |
Not Equal |
Selects records where the value of a field is not equal to the specified value |
Greater or Equal |
Selects records where the value of a field is greater than or equal to the specified value |
Less or Equal |
Selects records where the value of a field is less than or equal to the specified value |
Min. Length |
Selects records that contain at least the specified number of characters |
Max. Length |
Selects records that contain no more than the specified number of characters |
List |
Selects records where the value of a field is equal to one of the specified comma separated values |
Empty |
Selects records where the value of a field is empty or NULL |
Not in List |
Selects records where the value of a field is not equal to one of the specified comma separated values |
Like |
Selects records where the value of a field is similar to the specified value. A percent sign (%) can be used as a wildcard. For instance, "ABC%" will return all records where the value in the specified field starts with "ABC". |
Not Like |
Selects records where the value of a field is not similar to the specified value. A percent sign ( % ) can be used as a wildcard. For instance, "ABC%" will return all records where the value in the specified field does not start with "ABC". |
Data Selection Rules
Data selection rules are used to specify a set of conditions that data extracted from a source table must satisfy. By applying selection rules, only the subset of data that you actually need is loaded into the data warehouse or staging database.
On data warehouses you can add data selection rules on both the table level and the source table level. If more than one source table delivers data to a data warehouse table, you can set up different rules for each source table, but you can also set up general rules.
On Qlik models, data selection rules are set on the source table level.
You can add usage conditions to selection rules based on project variables. This enables you to e.g. load less data in a development environment than in the production environment.
Adding a Data Selection Rule
- Expand the business unit that contains the data source you want to apply the selection rule to, expand Data Sources and then expand the relevant data source.
- OR -
Expand the data warehouse that contains the table you want to apply the selection rule to and expand Tables. If you want to apply the rule to a specific source table, expand the table and Mappings as well.
- OR -
Expand the Qlik model that contains the table you want to apply the selection rule to and expand Tables. If the table is concatenated, expand Mappings as well. - Right-click the table, or source table, you want to add the selection rule to and click Add Data Selection Rule.
The Data Selection pane appears in the right hand side of the window. - Click the field you want to use in the selection rule.
- In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
- If applicable, type a value for the operator in the Valuebox.
- Click Add.
All selection rules that you have applied to a table are displayed below the relevant table.
Adding a Usage Condition to a Selection Rule
To add a usage condition to a selection rule based on a project variable, follow the steps below.
- Right click a selection rule and click Add Usage Condition. The Usage Condition panel is displayed in the right hand side of the application window.
- In the Usage Condition panel, click the variable you want to use.
- In the Operator list, click the operator you want to use. You have the following options:
- Equal
- NotEqual
- GreaterThan
- LessThan
- GreaterEqual (Greater than or Equal to)
- LessEqual (Less than or Equal to)
- In the Comparer list, click the general data type of the variable, which TimeXtender will use in the comparison. You have the following options:
- String
- Date
- Numeric
- Type the value you want to compare the parameter with in the Value box.
- Click Add to add the usage condition.
For more information about project variables, see Project Variables.
Data Validation Rules
Validation rules ensure a high level of accuracy and reliability of the data in the data warehouse and are used to discover invalid data. You can apply validation rules at the field level in the staging database or at field level in the data warehouse.
While data is cleansed on the staging database, it often has to be cleansed again if you have consolidated data from different business units in the data warehouse.
You can make a validation rule conditional if you want the rule to apply in specific situations only.
For each validation rule you apply to a field, you must also classify the severity of a violation. The following classifications are available:
Severity | Definition |
---|---|
Warning | The violation is not critical to the data quality and does not require immediate attention. The data is considered valid and will still be made available to the end users. |
Error | The violation is critical to the data quality and requires immediate attention. The data is considered invalid and will not be made available to the end users. |
Adding Data Validation Rules
You can add any number of validation rules to a field.
- Expand the business unit that contains the data source you want to apply the validation rule to, expand Data Sources, expand the relevant data source and expand the relevant table.
- OR -
Expand the data warehouse that contains the table you want to apply the selection rule to, expand Tables and expand the relevant table. - Right-click the field, you want to apply the validation rule to, and click Field Validations. The Field Validations pane appears in the right-hand side of the window.
- Click the field you want to use in the validation rule.
- In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
- If applicable, type a value for the operator in the Value box.
- Click Error to specify that as the severity level or leave it at Warning.
- Click Add to add the rule.
Adding Conditions
You can add any number of conditions to your validation rules. Follow the steps below to add a validation rule.
- Locate the selection rule you want to modify.
- Right-click the rule and then click Add Condition. The Condition pane is displayed.
- In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
- In the Value field, type the value you want to use in the comparison.
- OR -
Click Field and click the field, you want to use in the comparison. - Click Add to add the condition to the rule.
The condition is displayed below the validation rule or transformation rule it belongs to.
To View Validation Errors or Warnings
- On the Reports menu, click Errors or Warnings.
- In the Database list, click the database that contains the table you want to view errors or warnings for.
- In the Table list, click the relevant table. The No. of rows box displays the number of errors or warnings in the table and the rows that violate the rules are displayed in the pane below.
- Click any row to display the error or warning message in the Error Message or Warning Message box.
Data Transformation
Fields transformations lets you modify existing data in a number of ways. You can, for example, easily reverse the sign of numeric values, trim fields or return a specified number of characters from the original field value.
Adding Field Transformation Rules
- Expand the business unit that contains the data source you want to apply the validation rule to, expand Data Sources, expand the relevant data source and expand the relevant table.
- OR -
Expand the data warehouse that contains the table you want to apply the selection rule to, expand Tables and expand the relevant table. - Right-click the field, you want to add a transformation rule to, and then click Field Transformations. The Field Transformation pane appears. In the pane, click the field you want to add a transformation to.
- In the Operator list, click the operator, you want to use, and then click Add.
Operator Description Upper Converts all text values to upper-case Lower Converts all text values to lower-case First Returns the number of beginning characters specified by the user Last Returns the number of ending characters specified by the user TrimLeft Trims padded spaces from the left of the data TrimRight Trims padded spaces from the right of the data Trim Trims padded spaces from the left and right of the data Fixed Inserts a fixed value that is specified by the user Custom Allows for custom SQL code to be executed ReverseSign Reverses the sign for numeric values TimeOnly
Returns only the time portion of a datetime field
DateOnly
Returns only the date portion of a datetime field
Replace
Replaces one set of characters with another
- If you have selected First or Last as the operator, enter how many characters you want to include in the Length field.
Adding Conditions
You can add conditions to transformation rules in the same way that you add conditions to validation rules. See Adding Conditions
0 Comments