Sql Server 2012 Integration Services Master Child Pattern

The child dimension load pattern, extending the Slowly Changing Dimension

Index

Creating a simple Master Child pattern within Sql Server 2012 Integration Services
The Master package for loading child packages in Integration Services 2012
The child extract package pattern in Integration Services 2012
The child dimension load pattern, extending the Slowly Changing Dimension
A pattern for loading fact tables with inserted, updated and deleted data
Project Deployment with Sql Server 2012 Integration Services

Introduction

The child load pattern is used by both dimension and fact table loads, sourcing the data from the staging transform views and loading it into the warehouse. The warehouse design dictates that the dimension tables (which are referenced by fact tables) are loaded first.

This post shows how to extend a slowly changing dimension load, to include an EffectiveFlag column for ease of querying, as per a Kimball recommendation. The only gotcha is that this extension will be overwritten each time the wizard is run.

The child load package control flow

All child load packages have the same control flow, resetting the package counter variables, running the data flow and finally updating the parent variables in the master package.

The Dim_Employee control flow

Reset Row Counters (Execute SQL Task)

By selecting "0" from an OLE DB connection, this single row, single column resultset can be assigned to each of the package counter variables. These variables are incremented within the the dataflow.

Reset row counters command Reset row counters assigning resultset to variables

Update parent variables (Script Task)

This script task updates the master package variables with the values of the current package variables. Two important quirks to note are

  1. The master package and child package variables must be named differently
  2. The User:: prefix should be omitted from the master package variable names
Updating the parent variables using an Execute Script Task

The script is straight forward once the variables have been passed in

/// <summary>
/// This method is called when this script task executes in the control flow.
/// Before returning from this method, set the value of Dts.TaskResult to 
/// indicate success or failure. To open Help, press F1.
/// </summary>
public void Main()
{
    Dts.Variables["RowsDeleted_Master"].Value = Dts.Variables["RowsDeleted"].Value;
    Dts.Variables["RowsInserted_Master"].Value = Dts.Variables["RowsInserted"].Value;
    Dts.Variables["RowsRead_Master"].Value = Dts.Variables["RowsRead"].Value;
    Dts.Variables["RowsUnchanged_Master"].Value = Dts.Variables["RowsUnchanged"].Value;
    Dts.Variables["RowsUpdated_Master"].Value = Dts.Variables["RowsUpdated"].Value;
    Dts.TaskResult = (int)ScriptResults.Success;
}

The extended slowly changing dimension data flow

The majority of the data flow is generated by the wizard as indicated below. I have annotated the data flow so future developers are aware of the implications of running the wizard again

Extended slowly changing dimension dataflow

Select from staging (OLE DB Source Connection)

This connection sources data from the staging database view vw_Dim_Employee. This view (and all views in the load patterns) is responsible for all transformations and the hashing of each rows data. Using a view for the data transformations allows for transparency and ease of debugging. Not having to run a package each time you are trying to cast or derive a column speeds up the development process and keeps the load packages relatively standard.

The only transformation happening in this particular view is ensuring nulls are removed, but depending on the fact or dimension being loaded, these definitions can become quite complex.

create view dbo.vw_Dim_Employee
as

	select	EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, HireDate, 
		HashCode = cast(hashbytes('MD5',
		coalesce(cast(EmployeeID as nvarchar), '') + 
		coalesce(cast(LastName as nvarchar), '') + 
		coalesce(cast(FirstName as nvarchar), '') + 
		coalesce(cast(Title as nvarchar), '') + 
		coalesce(cast(TitleOfCourtesy as nvarchar), '') + 
		coalesce(cast(HireDate as nvarchar), '')
		) as varbinary(16))
	from (
		select
			EmployeeID, 
			LastName = coalesce(LastName, ''), 
			FirstName = coalesce(FirstName, ''), 
			Title = coalesce(Title, ''), 
			TitleOfCourtesy = coalesce(TitleOfCourtesy, ''), 
			HireDate
		from	
			dbo.stage_Employees
	) t

go

In the case of the Slowly changing dimension, the HashCode isn't utilized in the data flow. However, it is for all other loads, so it's a good habit to get into by including it in the view. This keeps all transformation views, dimension and fact standard, having a hash and a runId column.

OLE DB data source for Employee dimension load

Rows Read (Row Count Task)

Assigns the rows selected from the source to the User::RowsRead package variable

Add RunId (Derived Column Task)

As for the extract pattern, adds a RunId column to the flow which is populated from the User::RunId variable.

Slowly changing dimension (Slowly changing dimension Task)

Stepping through the wizard, the steps of interest are

  1. Selecting the EmployeeID as the business key

    Tables and Keys page of SCD Wizard
  2. Assigning the Hire Date as a changing dimension. This is likely only to be changed to correct an error, an historical record is not required

    Columns page of SCD Wizard
  3. Assigning the start and end dates for record expiration

    Historical Attributes page of SCD Wizard

The rest of the wizard screens have been left as default. Once the wizard has completed the tasks above will be generated. Many will now be extended to include the EffectiveFlag column

Extending the Update Branch

The changes made to the update branch are to set the EffectiveFlag to false for historic records.

Update branch of the SCD

Derived Column

Adding a EffectiveFlag column with its value set to false

Adding an EffectiveFlag derived column (false)

OLE DB Command

Extending the update command to recognise this column

Updating the update command with the EffectiveFlag

Extending the Insert Branch

The changes made to the insert branch are to set the EffectiveFlag to true for new records.

Insert branch of the SCD

Derived Column 1

Adding a EffectiveFlag column, this time with its value set to true

Adding an EffectiveFlag derived column (true)

Insert Destination

Extending the insert destination to recognise this column

Updating the insert destination with the EffectiveFlag

Summary

Testing the above, changing Margaret to Maggie we can see a new record is added.

Results of Dim_Employee

Querying the Dim_Employee table is now as simple as

select * from dbo.Dim_Employee where EffectiveFlag = 1

Source Code

git clone https://github.com/mindfulsoftware/SSISMasterChildPattern.git