Views created in Azure Synapse Serverless SQL pool enable you to query data in your Azure Data Lake without moving the data. Azure Synapse Workspace offers the ability to use T-SQL queries on semi-structured and unstructured data.
These views can be used as an SQL source in the TimeXtender ODX or as an External SQL connection in the TimeXtender Datawarehouse.
In this article you will learn about:
- Use Cases
- View Creation in Azure Synapse Workspace
- Using the Azure Synapse Workspace VIEW as source in TimeXtender
- Prerequisites
- Troubleshooting
Use Cases
- Combining many and/or large files in existing (curated) Azure Data Lake.
- Aggregating many and/or large files in existing (curated) Azure Data Lake.
- Creating a relational structure over disparate raw data stored in Azure Data Lake or Cosmos DB without transforming and moving data.
View Creation in Azure Synapse Workspace
- As a first step (if not existing), let’s create a new database within the SQL Pool using SSMS.
In SSMS, we'll execute the following query to create a new database. If you already have a database you can use that and ignore this step.
CREATE DATABASE TXdatalakeQuery;
- As next step let’s create a view within the database.
With this query example we create using SSMS a simple view on a set of Parquet files:
USE [TXdatalakeQuery]
GO
CREATE VIEW [dbo].[dbo_vNYtaxiTrips20-21] AS
SELECT *
FROM OPENROWSET(
BULK 'https://YOURdatalake.blob.core.windows.net/NYtaxiTripdata/fhvhv_tripdata_*.parquet'
, FORMAT = 'PARQUET') AS c
GO
With this query example we create using SSMS a VIEW on a set of CSV files:
USE [TXdatalakeQuery]
GO
CREATE VIEW [dbo].[dbo_vNYtaxiTrips20-21csv] AS
SELECT *
FROM OPENROWSET(
BULK 'https://YOURdatalake.blob.core.windows.net/NYtaxiTripdata/CSV/fhvhv_tripdata_*.csv'
, FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
)
WITH (
[hvfhs_license_num] VARCHAR (200) COLLATE Latin1_General_100_CI_AI_SC_UTF8,
[dispatching_base_num] VARCHAR (200) COLLATE Latin1_General_100_CI_AI_SC_UTF8,
[pickup_datetime] datetime2(7),
[dropoff_datetime] datetime2(7),
[PULocationID] int,
[DOLocationID] int,
[SR_Flag] VARCHAR (200) COLLATE Latin1_General_100_CI_AI_SC_UTF8
) AS d
GO
After executing the scripts you can check the structure of the new created views (see screenshot below) in the Synapse Workspace database under the 'views' node.
If the new database and/or views are not listed, please use the refresh action (F5) from the menu.
More examples of data lake queries can be found in this TimeXtender Knowledgebase article: Query ODX Parquet files with Azure Synapse Workspace
Using the VIEW as source in TimeXtender
As a next step, the created VIEW can be used as a data source in TimeXtender. There are multiple ways to use the VIEW as input in TimeXtender.
The first option: Use the TimeXtender Azure Data Factory SQL connector to connect to you Azure Synapse Serverless SQL pool. More info about how to setup Azure Data Factory connectors can be found here: Transfer Data from Source to ODX using Azure Data Factory
Another option is to load the data directly into the Datawarehouse using the ‘external SQL connection’ functionality. More info about how to add external SQL connections can be found in this TimeXtender Knowledgebase article: Tables in TimeXtender
Prerequisites
- Create a Synapse Workspace (MSFT: Create a Synapse Workspace)
- Note: When creating your workspace, make sure to create your own Security credential, rather than the default, sqladminuser.
- Connect SQL Server Management Studio (SSMS) to your workspace (MSFT: Connect Synapse to SSMS)
- Note: When connecting SSMS to your Synapse workspace, make sure to use the SQL on-demand endpoint, rather than the SQL endpoint.
Troubleshooting
Getting this error?
FUNCTION 'OBJECT_SCHEMA_NAME' is not supported.
Module: .Net SqlClient Data Provider
System.Data.SqlClient.SqlException
Read here: Using Synapse Serverless as a datasource in TimeXtender
0 Comments