Friday, 16 September 2016

Using SAS as a source in SSIS

I want to extract data from a SAS database file (*.sas7bdat). How do I do that in SSIS?

This is possible but not out of the box. You need to install an extra provider to accomplish  this.

1) Download SAS Provider
First you need to download and install the SAS Providers for OLE DB. There are multiple versions make sure to download the correct version (otherwise you get error messages like "This application does not support your platform"). You only need the select SAS Providers for OLE DB.
Install SAS Providers for OLE DB

2) Setup OLE DB Connection Manager
After installation the new provider will be available in OLE DB Connection Manager editor. Make sure to choose "SAS Local Data Provider X.X". This is the provider that can read SAS database files (*.sas7bdat).
SAS Local Data Provider 9.3

Second import step in the setup is to select the folder where the sas7bdat files are located. Don't select a file! All files will appear as tables in the OLE DB Source component. In my case I could leave the User name and Password fields empty because I already had access to the folder (but I'm not an SAS expert).
Fill in folderpath in Server or file name field

3) Setup OLE DB Source Component
Now you can use a regular OLE DB Source Component to extract data from SAS. However there are two concerns. When you select a table and close the editor you will get a warning that there is something wrong with the code page.
Cannot retrieve the column code page info from the OLE DB provider.
  If the component supports the "DefaultCodePage" property, the code page
from that property will be used.  Change the value of the property if the
current string code page values are incorrect.  If the component does not
support the property, the code page from the component's locale ID will
be used.

After clicking OK there will be a warning icon in the OLE DB Source Component which you can remove by setting the "AlwaysUseDefaultCodePage" property on true.
Before and after changing AlwaysUseDefaultCodePage

The second concern is more annoying: all datatypes will be DT_SRT (ansi string) or DT_R8 (float). You cannot change this and you need to add a data conversion.
Date(times) are also numbers: dates will be a number of days after January 1 1960 and datetimes will be the number of seconds after January 1 1960 and any decimals are used for milliseconds. A Derived Column expression for date could look something like:
DATEADD("DD", (DT_I4)[mydatecolumn], (DT_DATE)"1960-01-01")
All string or float

Tip: you can also use BIML to create SSIS packages with a SAS7BDAT source.

Using SAS as a source in BIML

I recently created packages with a SAS source, but now I want to use the same SAS source in my BIML Script. But I'm getting an error that the Local Provider doesn't support SQL. How can I solve this?
Error 0 : Node OLE_SRC - DIM_TIJD:
Could not execute Query on Connection PROFIT1:
The Local Provider does not currently support SQL processing.

There is NO easy solution for this. The provider doesn't support SQL Queries and that's what the BIML engine does first to get the metadata from the source table. Luckily there is a search-and-replace workaround. A lot of extra work, but still much easier then creating all packages by hand!

1) mirror database in SQL server
I used the metadata from SAS to get all tables and columns which I then used to create (empty/dummy) SQL Server tables with the same metadata as SAS (The datatype is either varchar of float). The tool to get the SAS metadata is SAS Enterprise Guide. It lets you export the metadata to for example Excel and then you can use that to create the dummy tables.
A little script created by a SAS developer to get metadata

Metadata export example in Excel

Instead of the SAS OleDB connection manager I used a temporary SQL Server OleDB connection manager, but I also kept the SAS OleDB connection manager in my BIML code and gave both the same name with a different number at the end (easier to replace later on).
BIML Connection Managers

Because the SAS OleDB connection manager isn't used in the BIML code it won't be created by the BIML engine. To enforce that, I used a second connections tag between </Tasks> and </Package>. It also lets me give them nearly the same GUID (easier to replace later on).
BIML Force create connection managers

The end result of the BIML script:
  • A whole bunch of packages that use the SQL Server database as a source (instead of SAS DB)
  • Two connection managers with nearly the same name and GUID (SAS OleDB and SQL OleDB)

3) Search and Replace
Now you must open all generated packages by using View Code (instead of View Designer). When all packages are opened you can use Search and Replace to change the name and GUID in all packages. Make sure you don't replace too much that could damage your generated packages. Then save all changes and close all packages. Next open your packages in the designer to view the result.

Tip: you can use also the same metadata (and a big if-then-else construction) to create a derived column in BIML that casts all float-columns to the correct datatypes (int, date, decimal, etc.).

Related Posts Plugin for WordPress, Blogger...