Sql Server 2012 Integration Services Master Child Pattern

The child extract package pattern in Integration Services 2012

Index

Creating a simple Master Child pattern within Sql Server 2012 Integration Services
The Master package for loading child packages in Integration Services 2012
The child extract package pattern in Integration Services 2012
The child dimension load pattern, extending the Slowly Changing Dimension
A pattern for loading fact tables with inserted, updated and deleted data
Project Deployment with Sql Server 2012 Integration Services

Introduction

This post details the Child Extract Pattern, the sole purpose of this pattern is to extract data from the source system and insert it into a staging table. The source of data can be database, or file based (maybe via ftp). There may no longer be in-house expertise on how to extend or develop these source systems. For these reason it's often best just to extract the data "as-is", once it's in a staging table, the data transformations can be executed.

Child package responsibilities for ETL process

This package simply loads the data into a very loosely specified table, the looser the better in terms of a successful extraction. A staging table with each column defined as nvarchar(255) null is completely acceptable. The load package, via a transformation view takes care of the data manipulation, casting and converting.

The Extract Control Flow

Control flow for extract package pattern

Stepping through each task in the control flow (above) we have

Load package specific variables (Execute SQL Task)

Each Extract (or Load) package may have package specific variables required to execute the package. Generally these would be a date range, a region or similar e.g. Only load the last 3 months worth of data. These variables are stored within the ETL_Admin database in the dbo.PackageVariable table and are loaded by PackageId. These variables must already exist in the package and are matched by name.

Load package variables from ETL Admin Database - General Load package variables from ETL Admin Database - Parameters Load package variables from ETL Admin Database - Resultset

Assign Package Variables (Execute Script Task)

The script below shows how the variables are assigned to. Prefixing the variables with pv_ is a visual indicator of which are loaded from the database. In this example I'm loading USA Employees, only which is configured with the pv_Country variable. I will use this variable to build an Sql Select statement

/// <summary>
/// This method is called when this script task executes in the control flow.
/// Before returning from this method, set the value of Dts.TaskResult to 
/// indicate success or failure. To open Help, press F1.
/// </summary>
public void Main()
{
    try {
        var table = ((DataSet)Dts.Variables["PackageVariables"].Value).Tables[0];

        if (0 != table.Rows.Count) {
            foreach (DataRow row in table.Rows) {
                if (Dts.Variables.Contains(row["Name"].ToString())) {
                    Dts.Variables[row["Name"].ToString()].Value = row["Value"].ToString();
                }
            }
        }
        Dts.Variables["ExtractedOn"].Value = DateTime.Now;

        Dts.TaskResult = (int)ScriptResults.Success;
    }
    catch {
        Dts.TaskResult = (int)ScriptResults.Failure;
    }
}

Truncate data from staging (Execute Sql Task)

Truncates the staging table of previously loaded data. Note, that truncate deletes all rows without writing individual deletes to the log making it fast and efficient

Build source select statement (Execute Script Task)

This is an optional task which allows you to dynamically build your Sql Statement. This is then used as the source in the dataflow. If there is no requirement for dynamic sql, the statement can be entered directly into the connection manager and this task disabled.

/// <summary>
/// This method is called when this script task executes in the control flow.
/// Before returning from this method, set the value of Dts.TaskResult to 
/// indicate success or failure. To open Help, press F1.
/// </summary>
public void Main()
{
    // TODO: Add your code here
    Dts.Variables["SqlSelect"].Value = string.Format("select EmployeeID, LastName, " +
        "FirstName, Title, TitleOfCourtesy, HireDate from dbo.Employees " +
        "where Country='{0}'", Dts.Variables["Country"].Value);

	Dts.TaskResult = (int)ScriptResults.Success;
}

The Extract Data Flow

The child extract pattern data flow

Select source data (OLE DB Source)

Depending on whether you elect to use dynamic sql, a variable or sql statement is entered directly. In the case of a variable a dummy statement is required for the save validation. Here I have use a similar statement (with the same column names) but ensure it returns no data by default, rather than returning an unbound resultset which could pull down server resources.

Source Connection Manager

At runtime this statement will be overwritten

Row Count (Row Count)

Assigns the row count to the row count variable

Add Derived Columns (Derived Column)

Adds metadata columns of when the data was extracted and from where

Add derived columns task editor

Insert into staging (OLE DB Destination)

Finally insert the (meta) data into the stage_Employee table using a OLE DB Destination

OLE DB Destination Mappings for staging data

Jumping back out to the Control flow, the final task is

Update parent variables (Execute Script Task)

This task Assigns the Rowcount variable declared in the child package, to the RowsRead variable in the Master package. Variable scoping allows child packages to see Master package variables, just ensure you remove the User:: prefix.

Update parent variables - Execute Script Task

Summary

Running the Master package, we can see we have loaded 5 Employee records into a staging database!

Results from the stage_Employee table

I have also created an Orders extract package to demonstrate multiple packages running at once. Next up is the Child Load Package which is responsible for transforming and loading delta records into your warehouse.

Source Code

git clone https://github.com/mindfulsoftware/SSISMasterChildPattern.git