We have a new community! Please visit support.timextender.com
Follow

Connect to CSV with CData

Note: There is an updated version of this article, which applies to the next generation of TimeXtender.

Before you attempt this you need to add the CSV provider to your list of available sources.

In the business unit we already have two data source providers that can connect to CSV/TXT files and for the most part I would suggest using those. There is however a few times where this can be used. Especially because this can connect to a folder with multiple files with different columns and fields in each of them.

Update

It is now possible to aggregate multiple files with different names, but the same structure into a single file, with the file name as a field.

Go through this guide before you start.

Add a CData data source

CSV Information

Here is a link to the file I use in this example

http://mysafeinfo.com/api/data?list=moviesbyearnings2005&format=csv

This one is set up with elements like this.

Year,Rank,Title,Gross $,Month Opened,Theatres
2005,1,Star Wars: Episode III - Revenge of the Sith,380270577.000000,2015-05,3663
2005,2,The Chronicles of Narnia: The Lion the Witch and the Wardrobe,291710957.000000,2015-12,3853
2005,3,Harry Potter and the Goblet of Fire,290013036.000000,2015-11,3858
2005,4,War of the Worlds,234280354.000000,2015-06,3910

CData Setup

Add a CData data source and choose the CSV provider.

The first step in setting up a CData connector should always be to click the ? in the top right corner.

Sp_rgsm_lstegn.PNG

CSV guide from CData

Connecting to one file

You need to save the link in the information area as something csv and save it in a folder your user has rights to.

I saved mine in C:/Flat Files/Movies2005.csv. Then you just add a link to it in the Data Source field.

You can then set it up like this.

1.PNG

You just need to add the location of the file in the URI or Data Source field and then it should work.
Another important thing to change is the following Row Scan Depth field.

mceclip0.png

By default this is set to 100 as you can see, but I would change it to 0 which is equal indefinite and then it will read the whole file before determining the fields data types.

Connecting to more than one file

If you have multiple files, you can connect to a folder. Like this

2.PNG

You can also point to files across many folders. You just need to be sure the following features are turned on.

mceclip0.png

So you need to set Include Subdirectories to True and if you do not want it to go all the way to the end of the path, you can change the Directory Retrieval Depth to another number. -1 means that it has no limit.

Connect to CSV files on a FTP/SFTP server

You also have the option of connecting to a FTP server that contains CSV files. The connection string is as follows.

URI=ftps://localhost:990/folder1; User=user1; Password=password1;

I have a FTP server with a folder that contains some CSV files and here is how I set it up.

3.PNG

So I added my password to the Password field, my user account to the User field and in the URI field I added the location of the server. The default ftp port is 21.

If the files are stored on a SFTP server it works similarly, but you need to change the string slightly.

URI=sftp://127.0.0.1:22/folder1; User=user1; Password=password1;

mceclip0.png

So the URI needs to start with sftp:// and the default port is 22 instead.

Connect to files on an Azure Storage

There are two methods to connect to Blob store containers and one to connect to File Shares:

For all of them you need to add the following.

Find the storage account name and add that to the Azure Storage Account field. You also need a Azure Access Key. This can be found here.

mceclip4.png

ABFSS:// Data Lake Gen 2  

Important: specify abfss:// in URI in lowercase letters

mceclip3.png

AzureBlob://

mceclip0.png

For both methods what comes after the // is the same. First is the container name, then the folders and finally the file. Though you can choose to point at only the folder if you want.

If your file is stored in a FileShare instead of a container you can use the azurefile:// option

mceclip2.png

AzureFile://

mceclip1.png

The file is not using the default delimiter settings

As long as it is comma separated, this is all you need. If it is not comma separated there are two options.

1st option

There are two fields you can use to change the default settings.

The first field is the FMT field. Seemingly it has three options.

mceclip0.png

That is actually not the case. As explained here Link it can actually be filled with whatever sign you want to use as a delimiter.

Here I overwrote the std options with a 'pipe' | sign.

mceclip1.png

This when synchronized will give the same as if it was a , sign like default.

The other field is this Row Delimiter field.

mceclip2.png

So if your file doesn't use CR LF as the row delimiter, you can add it here.

2nd option

http://mysafeinfo.com/api/data?list=moviesbyearnings2005&format=csv&delimiter=|

It still needs to be saved in that folder in C:/Flat Files/Movies2005.csv. Additionally, you need to create a file called schema.ini. It has to be named exactly that, or it wont work.

You can read about the various parameters by googling csv and schema.ini, but for this we need it setup like this.

[Movies2005.csv]
Format=Delimited(|)

First off you need to have the full file name as the first line in square brackets. Then you need to add each change to the file that is not default, mainly delimiters and such.

Then when you synchronize it looks at this file and finds the correct delimiter. This also works for multiple files, they just need to be added below with the same setup.

Like this for a folder.

[Company.csv]
Format=Delimited(;) [Customer.csv]
Format=Delimited(;) [Date.csv]
Format=Delimited(;) [Item.csv]
Format=Delimited(;) [Sales Invoice Line.csv]
Format=Delimited(;)

Are you connecting to files on a FTP, they also need the ini file added to its folder.

Getting the data

After setting up the provider, all needed is to synchronize/read the data source.

After that you can set up incremental load and make data type overwrites.

If your data doesn't come out as you expecting, go through the CData guide and see if you can turn on a setting that will give you the correct look.

Aggregate multiple files into one

A new feature of CSV is the option to aggregate files in a folder into one.

I have this folder with some files that contain data from a specified month.

Other than that they are exactly the same. This is important for it to work, they have the exact same field names in all of the files. If not there are some options which is present in the issues segment.

I connect to the files and use the following settings.

mceclip0.png

URI: points to the folder where the files are stored. As described before, this could also be a folder on a FTP/SFTP server.

Aggregate Files: Needs to be set to True for the CSV provider to attempt to merge the files.

Specifically for this I also change the default Row Scan Depth to 10000. This is because I had no decimal points in a field for at least 100 rows, but there where one in a row above that number.

When I synchronize the data source it will generate a table called AggregatedFiles.

mceclip2.png

It contains data from all 10 files with the URI field showing the file name.

mceclip3.png

Using a asterisk to point at specific files

It can be that you have more than one file type in the folder you do aggregates at, for example some files with a date part of their names, but also files that do not contain the same fields and data types, like so.

mceclip1.png

You only want to merge the Orders files not the others. To do this you change the Include Files field to contain the name and add an asterisk for the part you want to ignore. Like so.

mceclip2.png

Now it will only aggregate the two order files and not the others. You can also mix this with the Include Subdirectories feature to do it across many folders. It also works for all the special connections like FTP, Azure and similar.

Issues

If you cannot control that the field names always is the same, you can set it to ignore the field names and skip the 1st row in each file. Such as below.

mceclip5.png

Another option is to point to a specific Metadata file in the Metadata Discovery URI field and pull the info from there.

The above settings will give this. Where all field names is called Col1 and so forth.

mceclip6.png

Troubleshooting data types

CData provider attempts to detect data types, but sometimes it may run into errors.  There are a few workarounds for quick data extraction:
1. Row Scan Depth
The number of rows to scan when dynamically determining columns for the table.
Set it to 0 if the source file is not too big.  Setting this value to 0 (zero) will parse the entire CSV document.
increasing RowScanDepth ( >100 )  may help in some cases.

2. Data Type Override
Manually set up rules to override specific data types.

3. Set Type Detection Scheme = “None” in the provider setup dialog options
preview-full-image.png
This will bring all data as string type (varchar). You may convert it later to other types as needed.

This technique may work in any CData provider which exposes Type Detection Scheme property.

Dates are shown in the wrong timezone

Sometimes your datetime fields will have added or subtracted hours. This is due to it being converted to your timezone. The CData provider expects all dates will come in UTC and add hours according to the settings on your computer. It even knows about summertime.

If you do not want this to happen, you need to add this to the other field

mceclip0.png

ConvertDateTimeToGMT

Convert the character set

UPDATE - The following are not necessary there now is a Character field you can set.

mceclip3.png

If your CData release is a legacy version, you can set it like the below still

If you have a CSV file in a specific encoding, you can convert it by adding this to the Other field.

Charset=windows-1252

You can also use Charset=CP1252, which gives the same result and it could be other encodings as well.

mceclip0.png

The files were made with different culture settings

If your file is created with some specific culture settings you can use the two fields Culture and Client Culture to align it to a specific one.

culturesettings.gif

The list of available options is here.

So if my files are set up with Dutch culture settings and I wanted to convert them to settings for USA, I would add nl-NL in the Client Culture field and en-US in the Culture field. Then it should align the values after this when you synchronize.

 

 

Was this article helpful?
0 out of 0 found this helpful

9 Comments

  • 2
    Avatar
    Hansen, Mikkel Hecht DK - NOH

    Hi can you create a connection to a file folder and only include some but not all of the files in the folder, using some kind of wildcard, Regular expression or other?  like File_Actuals_*.CSV or like?

  • 5
    Avatar
    Nils Östborg

    Good question, Mikkel! And also, can you get the file name as part of the data? Other things like file size and OS timestamp? Thanks

  • 0
    Avatar
    Lennaert van den Brink

    Better late than never: yes, you can give a mask under the option "include files". For example, you can do [include files] = file_*.csv and it will load all CSV files that start with "file_"

  • 0
    Avatar
    Rik Miltenburg

    Hi,

    When i'm reading my csv files with Cdata source it gives me the "SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM." error. Is there a way in Cdata source settings similar to the oracle data soruce settings to convert out of range dates?

  • 0
    Avatar
    Thomas Lind

    Hi Rik

    No, there is no option like that.

    You can try two things. 

    1. Convert the field to a datetime2 using the data type override feature. A datetime2 is allowed to be outside the normal range set by a normal datetime.
    2. Or be sure that the field is read as a string and then convert it to an allowed date when it is added to the DSA.
  • 0
    Avatar
    Bart Peek

    Is it possible to have the Data Source delete the files after the processing has been completed?

  • 0
    Avatar
    Thomas Lind

    Hi Bart

    No, there are no options for post-transfer changes.

    My suggestion is to use a bat file that removes or moves the files from the folder you look in.

    cd "C:\FlatFiles\GL\"
    del *.csv

    You can run a job with a query in a script action. The job can run the batch file. You can then set it to run the script as a post-transfer job.


    Edited by Thomas Lind
  • 0
    Avatar
    Sierd

    Hi,

    Is it possible to use ConvertDateTimeToGMT inside an RSD file for JSON connections? 

     

     

  • 1
    Avatar
    Frank Wagner

    Hi Sierd,

    yes, you can set the ConvertDateTimeToGMT option within the RSD file like this:

    <api:set attr="Other" value="ConvertDateTimeToGMT=False;" />

    or

    <api:set attr="Other" value="ConvertDateTimeToGMT=True;" />

    respectively.

    /Frank

Please sign in to leave a comment.