Sql Server 2012 Integration Services Master Child Pattern

A pattern for loading fact tables with inserted, updated and deleted data

Index

Creating a simple Master Child pattern within Sql Server 2012 Integration Services
The Master package for loading child packages in Integration Services 2012
The child extract package pattern in Integration Services 2012
The child dimension load pattern, extending the Slowly Changing Dimension
A pattern for loading fact tables with inserted, updated and deleted data
Project Deployment with Sql Server 2012 Integration Services

Introduction

This pattern will demonstrate loading deltas into a fact table by comparing what's already in the warehouse with what's in staging. Again, all the transformation logic is within the staging view, making development faster and easier to debug. More information for this design decision is here, along with the control flow description, which is identical for this package.

The child fact load package data flow

Data flow for a fact load package

Select from staging (OLE DB Source Connection)

The transformation and load logic is embedded within the view. The key aspects of the view definition are

  1. Creating a hash of each record which will be used for row comparison later in the flow
  2. Assigning and using warehouse keys rather than business keys in the fact table
  3. Transforming data (removing nulls, casting, etc)
create view dbo.vw_Fact_Orders
as
	select	OrderId, ProductKey, EmployeeKey, OrderDateKey, RequiredDateKey, 
		ShippedDateKey, UnitPrice, Quantity, Discount,
		HashCode = cast(hashbytes('MD5',
			coalesce(cast(OrderId as nvarchar), '') + 
			coalesce(cast(ProductKey as nvarchar), '') + 
			coalesce(cast(EmployeeKey as nvarchar), '') + 
			coalesce(cast(OrderDateKey as nvarchar), '') + 
			coalesce(cast(RequiredDateKey as nvarchar), '') + 
			coalesce(cast(ShippedDateKey as nvarchar), '') + 
			coalesce(cast(UnitPrice as nvarchar), '') + 
			coalesce(cast(Quantity as nvarchar), '') + 
			coalesce(cast(Discount as nvarchar), '')
			) as varbinary(16))
	from (
		select
			o.OrderId, 
			ProductKey = p.ProductKey,
			--CustomerKey = ,
			EmployeeKey = e.EmployeeKey, 
			OrderDateKey = do.DateKey,
			RequiredDateKey = dr.DateKey,
			ShippedDateKey = ds.DateKey,
			UnitPrice = od.UnitPrice,
			Quantity = od.Quantity,
			Discount = od.Discount
		from	
			dbo.stage_Orders o
		join	
			dbo.Stage_OrderDetails od on o.OrderId = od.OrderId
		join
			DW_Northwind.dbo.vw_Dim_Employee e on o.EmployeeID = e.EmployeeId
		join
			DW_Northwind.dbo.Dim_Product p on od.ProductID = p.ProductId
		join
			DW_Northwind.dbo.Dim_Date do on 
				cast(convert(varchar(8), o.OrderDate, 112) as int) = do.DateKey
		join
			DW_Northwind.dbo.Dim_Date dr on 
				cast(convert(varchar(8), o.RequiredDate, 112) as int) = dr.DateKey
		join
			DW_Northwind.dbo.Dim_Date ds on 
				cast(convert(varchar(8), o.ShippedDate, 112) as int) = ds.DateKey
) t

The columns are selected in the OLE DB Source editor, with an additional derived column casting the hash to a string

Fact source data from staging

Select from warehouse (OLE DB Source Connection)

The warehouse source selects the identical columns from the fact table only. Each column is prefixed with db_ to provide uniqueness in the merge join

Fact source data from warehouse

Column sorting from both the staging and the warehouse

Before we can merge the 2 sources, both have to be sorted in order of the primary key of the source (OrderId, ProductId / Key). This sorting is available under the Advanced Editor or each source

Enabling the sort of an OLE DB data source output

The OrderId should have the SortKeyPostion of 1 and the ProductKey 2

Sorting the OrderId of an OLE DB data source output

Note: The sorting must be in place for both the staging and warehouse data sources.

Merge Join

The Merge Join task now merges the sorted outputs, joining on the OrderId and ProductKey.

Merge join using a full outer join

Conditional Split

This is where the hashing comes into play, detecting whether a record has changed. New and Old records are also detected, and the flow routed appropriately.

Conditional split of the merged data

The splits are, with reference to the fact table

  1. Delete
    A record does not exist in the source but does in the warehouse (the full outer join returns null for staging) a delete is required
  2. Insert
    A record does not exist in the warehouse (the full outer join returns null for the warehouse) an insert is required
  3. Update
    Records exist on both sides on the join and the hashes do not match, an update to the warehouse is required
  4. No Change
    Records exist on both sides on the join and the hashes match, no action required

After the conditional split, each of the commands and the insert destination are self explanatory, with screenshots below

Update Fact (OLE DB Command)

OLE DB update command statement for Fact_Orders OLE DB update command mappings for Fact_Orders

Insert Fact (OLE DB Destination)

OLE DB Destination for Fact_Orders

Delete Fact (OLE DB Command)

OLE DB delete command statement for Fact_Orders

Summary

The key aspect to this pattern is the correct configuration of the staging and warehouse data sources, they should

  • Have the same column count
  • Each column should be of same ordinal, data type and length
  • The sorting should be identical for each source

With merge join dependencies correctly configured, everything else falls into place.

Source Code

git clone https://github.com/mindfulsoftware/SSISMasterChildPattern.git