Introduction
One of the most important things that you can do as you begin a development effort is to develop a common, unified set of standards. In prior posts I have stressed the importance of naming conventions for database files, as well as connections in the packages. This is less about the naming of the objects, but rather the consistency, supportability, and modular design that SSIS allows which in turn creates a restartable product.
I do not run a package in production that cannot be started “from the top” by itself. There is quite a bit of thought and effort in some cases to ensure data consistency, proper cleanup of work tables, etc, however, in every case that work has paid off (multiple times over). There is nothing worse (and more dangerous) than to have to do research to figure out what part of a package needs to be disabled, which work tables need to be manually cleaned up, what data needs to be backed out before a restart…. generally when those things have gone wrong, a package has failed, it is not the time to try to remember what you were thinking when you (or someone else) was thinking when the package was built.
Package Design
Every package, dimension load, fact table, post-load processing, does not matter…. they ALL look exactly the same. There are 3 parts. Part 1 consists of establishing the audit-key in the audit table (there will be another post on SSIS load audits), determining some audit (pre-load) information about the table that is about to be affected. All of part-1 is driven by package variables, so there is no hard-code in this section. In addition some packages will have specific cleanup processes before part-2 which is the ETL. In the cleanup, you need to cleanup from the staging table (if you are going to re-stage the data), cleanup from the target if the ETL does not deal with existing data. This is the section that requires the most thought…. think about what your package needs to do should you run out of space in your target, you loose database connections to remote servers, any series of things that can go wrong…. should you need to start this package from the top, what data needs to be cleaned so the ETL can run cleanly. Part-3 is the post-load audit wrap-up. Again, driven by variables, after the load audit information is collected and the audit-rows is closed out with a completion time as well as statistics about that load.
Conclusion
There are many advantages to consistent package design, not the least of which is forcing the issue of restartablility. It is important that each modular piece of a load process is independent and functions as a standalone process. True, there are dependencies that need to be met between load processes,however, beyond those dependencies, the processes should each handle their own logical part of work. 1 package = 1 dimension load…. or 1 fact load etc. resist the urge to combine too much into a single package, restart is important, and if you load does too much in a single package, restart becomes difficult.
Leave a Reply