Database Driven Aggregations

Have you ever had a series of aggregate (tsql) queries to run?  Do you want to be sure they all run in the right order?  Do you want to make the changes “dynamic” (no code changes required)?  Certainly you can build an SSIS package that calls sql with the control-flow and control dependencies, however, each new query requires changes to that package.  This solution has a single proc-call (at the right point in the control flow), then hands control to the database.  This allow you to add new queries, change the order, disable, enable new ones, etc.

A few things to get setup before we start:
1) Table Creation for aggregation process — Optional table creation for logging
2) Proc Creation — This proc will require some modification for your specific environment.  There is logging of the process that takes place (in the optional table script), and some database links that you will need to adjust according to your specific needs.

Now for the easy part!  Insert into the aggregate driver a row for each query (primary) query you want to run, there may be sub-parts to the query, that is handled in the next table.  This table is the “driver” of the process, has the descriptive info, and allow for “active/inactive” setting.
Aggregate_Query_ID – Something that makes sense to you (I like logical operations together in groups of 1000)
Aggregate_group_ID – What you call the proc with (everything that is active for this group will run – in the order they are in the table)
Aggregate_freq_id – Future enhancement for Weekly/Monthly executions
Aggregate_Desc – What is this query, just a description
dim_audit_pkg_name – audit, what process is this query tied to (optional)
Source_Table_Name – what is the primary table for this query (descriptive, optional)
Source_Column_Name – what is the primary column for this query (descriptive, optional)
active_ind – Bit 0=off 1=on
aggregate_server – target server where the output of process will go
aggregate_database – target database where the output of process will go
aggregate_owner – target schema where the output of process will go
aggregate_table_name – target table where the output of process will go

Next, insert into the aggregate query table.  This is where the actual work goes.
aggregate_query_id – ties this group of queries to the aggregate driver table
aggregate_sequence – for queries that require more than 1 step, this is the order those queries will execute
query_desc – description (optional)
query – tsql valid statements to be executed

exec dw_dbo.aggreage_master_package 3  –this would execute the proc running all queries defined in aggregate_group_id 3 (this is the call in the SSIS package)  From here on out, any addition, change, inactivation to queries running in group 3 will be done in the database.

You have 2 different processes, each process has multiple queries, and each query has multiple parts.

agg_qry_id agg_grp_id agg_freq_id aggregate_desc dim_audit_pkg_nm source_tbl_name   active_ind agg_server agg_database agg_owner agg_tbl_nm
1005 1 1 Example aggregate 1 example_pkg orders   0 maggie dm_reporting dw_dbo dw_agg_order
1006 1 1 Example aggregate 2 example_pkg orders   1 maggie dm_reporting dw_dbo dw_agg_order
1050 1 1 Example aggregate 3 example_pkg orders   1 maggie dm_reporting dw_dbo dw_agg_order
2005 2 1 Example aggregate 4 example_pkg customers   1 maggie dm_reporting dw_dbo dw_agg_customer
2006 2 1 Example aggregate 5 example_pkg customers   1 maggie dm_reporting dw_dbo dw_agg_customer
2007 2 1 Example aggregate 6 example_pkg customers   1 maggie dm_reporting dw_dbo dw_agg_customer


You see that when we call the proc with group_id “1”, there are 3 queries, the first of which will be skipped because active_ind is set to false.  When we call the proc for group 2 all group 2 queries will run.

Taking a closer look at the first query that will execute under group 1 (which is actually the second query 1006):

aggregate_query_id aggregate_sequence query_desc query
1006 1 Delete data from staging table on server Bart (stg_dw_agg_order) DELETE FROM bart.dw_stg_db.dw_dbo.stg_dw_agg_order
1006 2 Insert aggregated year 2010 data into staging table INSERT into osprey.dw_work.dw_dbo.stg_dw_agg_order (order_year, customer_id, state, annual_sales) select year, customer_id, state, sum(annual_sales) where year = 2010 from dw_dbo.orders group by year, customer_id, state
1006 3 Insert aggregated search year 2010 data into dw_agg_order BEGIN TRAN; DELETE FROM dm_reporting.dw_dbo.dw_agg_order WHERE year = 2010; insert into dm_reporting.dw_dbo.dw_agg_order with (tablock) (year, cusomter_id, state, annual_sales from bart.dw_staging.dw_dbo.stg_dw_agg_order; COMMIT;


There are three parts for this query
1) clears the data from the staging table on the staging server
2) run the query to collect the 2010 data into the staging server table
3) remove the existing aggregate rows for the 2010 customers, and then replace them with the new data (this is done in a transaction, with tablock on purpose, if someone is querying the table this query will wait to obtain the lock, if this query is running, there are no other queries allowed, they will block and wait).  You don’t want someone getting a report of 2010 with no data…
 Once query 1006 is done…(meaning all 3 parts) the next query 1050 (which can have multiple parts as well) is then started.  Separating things into logical groups will allow you to have multiple calls to the proc run from SSIS at the same time, achieving some parallelism.  Once a year is done, you can simply turn it off, and add a new row for the next year, this allow for a re-run of historical aggregations (should there be a data issue that requires updates/changes to the aggregation, or simply an additional field in the aggregations) by setting the active indicator back on.

Tags: , , ,

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: