I have seen several posts lately asking about ROI/Justifications etc about upgrades. There are LOTS of posts on this topic, and some really great documentation from Microsoft that covers this topic at great length. This post does not really add anything to those, other than distilling some of the key points from the Microsoft documentation that I used when discussing the upgrade, benefits, and expectations with my business leaders.
With the 2008 release, SQL Server makes a major advance in scalability for data warehousing. It meets the data warehouse needs of the largest enterprises more easily than ever. SQL Server 2008 provides a range of integrated products that enable you to build your data warehouse, and query and analyze its data. These include the SQL Server relational database system, Analysis Services, Integration Services, and Reporting Services. This paper introduces the new performance and manageability features for data warehousing across all these components. All these features contribute to improved scalability.
Microsoft® SQL Server™ 2008 provides a comprehensive data warehouse platform. It enables you to build and manage your data warehouse, and deliver insight to your users, with a single, integrated product suite. It scales to meet the needs of the largest enterprises, in a way that empowers both your end users and your IT staff.
The number one focus of development in the SQL Server 2008 release was to improve scalability across the entire product suite to comfortably meet the needs of large enterprises. Here, we’ll introduce the features and enhancements we’ve added to improve your data warehouse experience. Build. Manage. Deliver. SQL Server 2008 lets you do it all, with ease.
Map of New Data Warehousing Features
The following table shows the new scalability features in SQL Server 2008, and where they help with the activities that surround your data warehouse (DW).
|SQL Server Relational DBMS||MERGE statement
Change data capture (CDC)
Minimally logged INSERT
|Backup compression||Star join performance
Faster parallel query on partitioned tables
Partition-aligned indexed views
|Integration Services||Lookup performance
|Analysis Services||Backup||MDX Query Performance: Block Computation
Query and Writeback Performance
|Scalable Shared Database|
|Reporting Services||Reporting scalability
The new data compression feature in SQL Server 2008 reduces the size of tables, indexes or a subset of their partitions by storing fixed-length data types in variable length storage format and by reducing the redundant data. The space savings achieved depends on the schema and the data distribution. Based on our testing with various data warehouse databases, we have seen a reduction in the size of real user databases up to 87% (a 7 to 1 compression ratio) but more commonly you should expect a reduction in the range of 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).
Backup compression helps you to save in multiple ways.
By reducing the size of your SQL backups, you save significantly on disk media for your SQL backups. While all compression results depend on the nature of the data being compressed, results of 50% are not uncommon, and greater compression is possible. This enables you to use less storage for keeping your backups online, or to keep more cycles of backups online using the same storage.
Backup compression also saves you time. Traditional SQL backups are almost entirely limited by I/O performance. By reducing the I/O load of the backup process, we actually speed up both backups and restores.
The new Resource Governor in SQL Server 2008 enables you to control the amount of CPU and memory resources allocated to different parts of your relational database workload. It can be used to prevent runaway queries (that deny resources to others) and to reserve resources for important parts of your workload. SQL Server 2005 resource allocation policies treat all workloads equally, and allocate shared resources (for example, CPU bandwidth, and memory) as they are requested. This sometimes causes a disproportionate distribution of resources, which in turn results in uneven performance or unexpected slowdowns.
With dimensionally modeled data warehouses, a big part of your workload typically consists of what are known as star join queries. These queries follow a common pattern that joins the fact table with one or several dimension tables. In addition, star join queries usually express filter conditions against the non-key columns of the dimension tables and perform an aggregation (typically SUM) on a column of the fact table (called a measure column). With SQL Server 2008, you will experience significant performance improvements for many star join queries that process a significant fraction of fact table rows.
Partitioned Table Parallelism
Wouldn’t you like to get the most power you can out of the hardware you own? The partitioned table parallelism (PTP) feature in SQL Server 2008 helps you do that. Data warehouse applications typically collect large amounts of historical data in fact tables, which are often partitioned by date. In SQL Server 2005, queries that touch more than one partition use one thread (and thus one processor core) per partition. This sometimes limits the performance of queries that involve partitioned tables, especially when running on parallel shared memory multiprocessor (SMP) computers with many processor cores. Partitioned table parallelism improves the performance of parallel query plans against partitioned tables by better utilizing the processing power of the existing hardware, regardless of how many partitions a query touches. The feature works by default without the need for manual tuning or configuration.
Partition-Aligned Indexed Views
Partition-aligned indexed views enable you to create and manage summary aggregates in your relational data warehouse more efficiently, and use them in scenarios where you couldn’t effectively use them before, improving query performance. In a typical scenario, you have a fact table that is partitioned by date. Indexed views (summary aggregates) are defined on this table to help speed up queries. When you switch in a new table partition, the matching partitions of the partition-aligned indexed views defined on the partitioned table switch too, and do so automatically.
This is a significant improvement over SQL Server 2005
GROUPING SETS allow you to write one query that produces multiple groupings and returns a single result set.
The MERGE statement allows you to perform multiple Database Manipulation Language (DML) operations (INSERT, UPDATE, and DELETE) on a table or view in a single Transact-SQL statement.