Warehouse Design Tip / Partitioning

Introduction

This topic seems so simple, and so obvious to me, and do not mean to insult anyone’s intelligence.  However, it is shocking to me the number of people that I have talked to (running large databases) that are not taking advantage  of one of the best performance, and manageability enhancements out there!  Think about the classic phone-book example everyone uses for index’s…. but broaden the example a little.  Consider you have a phone book from NY City, LA, Miami, and Chicago all combined into a single book…. even well indexed it is going to a little bit of effort to find all of the records for a given name in a given city.  Now if you partition that giant book on city, you are not even going to go look in the books that you’re not interested in. …OR… lets say you wanted to find a given name in all four cities, if you have one giant book… you will start looking… if you have the book partitioned into separate books, you could hand one book to three of your friends (threads) and each of you start looking in parallel… clearly divide and concur will get you the answer much more quickly.  

 In a warehouse environment I suggest finding some sort of logical annual partition.  It may be a calendar year, it may be a fiscal year, it may be some annual business cycle.  Make sure what you pick will make sense to your users (to make use if this power they need to include it as filter/join criteria in their queries).  If you pick an annual partition based on calendar year, however the business data is always looked at based on a fiscal year ending in July, even if the queries are well written to filter on the year, you will always use 2 partitions to answer a question because the fiscal year splits 2 calendar years.  

Example

 You can use the wizard in SSMS, or scripts either way you will need to create a partition function  

CREATE PARTITION FUNCTION [year_PF1](INT)
   AS range LEFT
   FOR VALUES (2005, 2006, 2007, 2008,
               2009, 2010, 2011, 2012,
               2013, 2014)  

Once you have the partition function, you will need the partition scheme to be associated to the files groups using a partition scheme  

CREATE PARTITION SCHEME [year_PS1]
   AS PARTITION [year_PF1] TO
   ([data_2005], [data_2006],
    [data_2007], [data_2008],
    [data_2009], [data_2010],
    [data_2011], [data_2012],
    [data_2013], [data_2014],
    [data_2014])  

Now you can create a table on the partition scheme allowing all data to be grouped together by year. The key portion of this statement is the “on” at the end of the table create statement.  

CREATE TABLE partition_example
   (      
      example_year  INT,
       fname         VARCHAR(100),
       lname         VARCHAR(100),
       rec_create_dt DATETIME,
       rec_mod_dt    DATETIME,
       audit_key     INT
   )  ON year_ps1(example_year)  

   

Conclusion

This is an overly simple example, however, please do yourself the favor and understand how partitioning works.  See my prior discussions on disk management to see how you can simply move data files from one drive to another, as well as the great optimization, and parallelism performance improvements you will see.

My Favorite New(ish) 2008 Feature (Resource Governor)

#TSQLTuesday

I have written in the past about setting up the resource governor, which I am counting as my favorite new(ish) feature, so I am taking this TSQL2sday post opportunity do my promised followup for monitoring of the resource governor.

Introduction

To start with, my advice about the resource governor is to start small.  Have some idea of what you want when you are done… do you need three pools, five pools… how may groups do you envision etc.  Once you “think” you know where you want to go, start small.  Start with a basic A/B split (user interactive Vs Batch process for example) and work your way up from there, monitoring the whole way.  Well balanced, the resource governor works very well… all your resources being consumed in a single pool gets you nowhere (and depending on your max settings can set you back quite a bit)…. so to achieve the “well balanced” lets look at monitoring.

The primary method for monitoring the internals of the resource governor is through the Performance Monitor statistics made available.

Through these metrics you will be able to see both the pool and group level information in much the same way that you would normally see sql server performance statistics, however they are now broken down at a lower level (so you can see how your allocations are performing).  Keep in mind an important point about resource governor, if you allocate 10% of the CPU as a max to a pool, that pool can still get 100% of the CPU if nothing else is running on the machine.  The resource governor ONLY “kicks in”, holds back resources, splits resources among pools (however you wish to think of it) if there is competition between pools.  This makes it important to determine how many, and which resources you are going to assign to the pools.  Pool Stats:

I have setup

  • Default – PLEASE don’t get rid of the default pool… I had discussion of this in my prior resource governor post.
  • Internal – Internal DB processes run here
  • adhoc_pool_high – This is where I have my “normal” interactive user queries, reports.
  • adhoc_pool_mde – This is where I have the “High Utilization” tools such as SPSS.  There are tools that will select large amounts of data for mining or other analysis where runtime is not measured in seconds, and can be separated to prevent those very large queries from dominating the machine while still running in reasonable times.
  • load_pool_high – This is where I have batch processes.  There is a scheduled job that changes priorities for this pool to a higher-level overnight for my batch-load processes.
  • maintenance_pool_low – This is where I run long-running maintenance processes that I just need to get done, but don’t care how long it takes… compressing historical data etc.  Things I want to keep out of the way of the rest of the system.

As you can see in the screenshot there is little else running right at this moment, other than the adhoc_pool_high, so in reality that pool has access to 100% of the cpu until there is something that shows up in one of the other pools that needs resources.

Workgroup setups

Assigned to each of the pools contains one or more workgroups, in this case you can see that the 74% of the adhoc_pool_high is all being used by the “special_user” workgroup.  This is where starting simple will help you greatly, there is no association (other than your knowledge… or looking in the database) between workgroups and pools that you can see in performance monitor.

My associations are

Default pool
   Default workgroup
Internal pool
   internal workgroup
Adhoc_pool_high
   Special_user (specific ID’s that I wish to GUARANTEE performance to)
   dashboard_user (Interactive BI dashboard that NEEDS to perform well)
   User Interactive (interactive reports, management studio, etc_
Adhoc_pool_med
   report_user (scheduled reports)
   user_tools (General bucket for tools other than interactive queries)
   user_spss (SPSS gets it own resource allocation to prevent it from overrunning the machine)
Load_Pool_High
   load_batch
Maintenance_Pool_low
   maintenance

It is at this level where you will be monitoring the resource allocations to see that they are meeting your expectations.  Resource governor is really where art and science meet, you will need to determine which workgroups/pools you wish to give priority to, and which ones will take the back-seat.  Once things are running that cross between workgroups that are assigned to different pools you will see the cpu control % numbers take effect.

In addition to the performance monitor stat, which I have personally found most useful for immediate feedback on the performance of the pools there are a few DMV’s that collect information along the way.  It is through these DMV’s that you can understand the associations between the workload group and the associated pool, as well as what your configurations are.

dm_resource_governor_workload_groups  Where you can see what your allocations/configurations are.
dm_resource_governor_resource_pools Where you can see what your allocations/configurations are.
dm_resource_governor_resource_configuration Is it enabled?

Richmond CodeCamp Disk IO Presentation

This is a very similar post to the one from the user-group presentation post… and in fact the presentation itself is very similar.  It has been updated, and a few things added since the last presentation, so I did want to post the updated version which was presented at codecamp.   I hope that the presentation left you with a few things to go think about, and gave you some tools to work with.

If you came out to see this presentation, please provide me some feedback through speaker-rate.  The feedback helps me build better presentations… and hopefully provide better, more informative sessions to you in the future.

I enjoyed the whole codecamp experience, and I hope that you got as much out of the day as I did!!  A truly impressive group of presenters with a great breadth of topics, I was glad to be a small part of the great day.

As I mentioned, everything I referenced in the presentation can be found my series on disk performance:

Help — My disk performance      (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help — My disk performance III  (Partitioning and compression)
Help — My disk performance IV (filegroups and files)  
Help — My disk performance V   (Monitoring)  
Help — My disk performance VI (Additional monitoring and conclusions)

Also, as promised, the presentation is available for reference on my Presentations page

Richmond Users Group Disk IO Presentation

 A sincere thank you to everyone that came out to my presentation on disk IO.  I hope that you had as much fun as I did, and (hopefully) took away a few things to think about.  I would appreciate your feedback , I enjoy speaking, and the feedback helps me build better presentations for the future.

As I mentioned, everything I referenced in the presentation can be found my series on disk performance:

Help — My disk performance      (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help — My disk performance III  (Partitioning and compression)
Help — My disk performance IV (filegroups and files)  
Help — My disk performance V   (Monitoring)  
Help — My disk performance VI (Additional monitoring and conclusions)

Also, as promised, the presentation is available for reference on my Presentations page.

SSIS – Load Audit

Introduction

There are many reasons to put some thought into an audit process for your load jobs.

  1. Tie the insert/update of a row to a specific SSIS package.
  2. Provide consistent date-stamp times for a given load (I will explain this in a moment).
  3. Ensure all the data staged actually made it to the target.
  4. Monitor load times, and easily find packages (or tables) that need tuning.
  5. Provide input to load jobs for selecting data from the source (you know the last time you ran successfully, select data that changed since then in the source… that is a topic for another post)
  6. Provide SLA reporting information on load-times.
  7. Probably a few other things that you will discover as you implement this.

If well thought out up-front, the additional major benefit of a good audit process is a cookie cutter approach to building load jobs.  They ALL do exactly the same thing, in exactly the same way using the same variables, and setup.  (The details of this is for another post, but I will cover the generalities here)

The Setup 

CREATE TABLE [dw_dbo].[ld_control_audit]
   (
       [audit_key]                 [INT] IDENTITY(1, 1) NOT NULL,
       [parent_audit_key]          [INT] NOT NULL,
       [tbl_name]                  [VARCHAR](50) NULL,
       [pkg_name]                  [VARCHAR](50) NULL,
       [exec_start_dt]             [DATETIME] NULL,
       [exec_stop_dt]              [DATETIME] NULL,
       [extract_row_cnt]           [BIGINT] NULL,
       [staged_row_cnt]            [BIGINT] NULL,
       [insert_row_cnt]            [BIGINT] NULL,
       [update_row_cnt]            [BIGINT] NULL,
       [error_row_cnt]             [BIGINT] NULL,
       [tbl_initial_row_cnt]       [BIGINT] NULL,
       [tbl_final_row_cnt]         [BIGINT] NULL,
       [tbl_max_surrogate_key]     [BIGINT] NULL,
       [successful_processing_ind] [CHAR](1) NULL
   )

Once you have the table, the view (in the default dbo schema) also calculates the execution time and executes with a NOLOCK, you want to be sure there is no locking by your monitoring processes on this table.

CREATE VIEW [dbo].[ld_control_audit] AS
   SELECT audit_key,
          parent_audit_key,
          tbl_name,
          pkg_name,
          exec_start_dt,
          exec_stop_dt,
          DATEDIFF(MINUTE, exec_start_dt, exec_stop_dt) AS runtime_minutes,
          staged_row_cnt,
          extract_row_cnt,
          insert_row_cnt,
          update_row_cnt,
          error_row_cnt,
          tbl_initial_row_cnt,
          tbl_final_row_cnt,
          tbl_max_surrogate_key,
          successful_processing_ind
   FROM   dw_dbo.dimaudit AS dimaudit_1 WITH (nolock)

Usage

Now that you have the basic table and view, the first thing your load will do is insert a row for the “Master Package” , which will create a new with a -9999 parent_audit_key to tie all of your “sub” executions to, and creates the exec_start_dt of this particular execution.

You now have an option…. you can select back the identity column you created, and pass it into each of your packages that you execute, or let each package lookup to get the audit-key that they will use as the parent_audit_key.  I DO NOT pass it in, but rather lookup at the top of each sub-package.  Reason 1, I execute all the sub-packages as out-of-process, to obtain a separate memory space for each of the executions, as I am running MANY processes in parallel.  Reason 2, restartability, if each package functions as an independent unit of work (including looking up the parent_audit_key) if a package fails for any reason, it can be re-run by itself without having to invoke the master-package, or manually adjusting variables etc.  If you are using  checkpoint files for restart this still plays well with that as the existing variable states (including audit_key assigned to the package, as well as parent_audit_key, record counts etc) are preserved for restart.

 

 If you have been reading any of my other posts, by now you have figured out I am “stuck” on naming conventions and consistency, but if you open ANY load package, they all look the same, so lets walk through the parts.

  1. Selecting the parent audit key from the audit table (max key for “master” for successful_processing_ind is null)
  2. Insert a new row into the audit table (VARIABLES for everything in the insert)…This is cut-paste for any package.
    There is a second sql statement that then selects the max audit-key for the @tablename that we just inserted a row for (this is what we will stamp on ALL rows affected by this load)
  3. This will select the package start-time from the audit table row.  This will be used for the rec-mod/rec-create dates in ALL rows affected by this load.  We are effectively going to tie all of the data in the warehouse back to a specific audit row, and a specific execution time.  If data needs to be backed out due to source issues or otherwise, all rows affected by the run, all have the same time.
  4. Using a scripting task, (same result as using an expression build in bi-express product if you have that).  This task sets SQL statement variables that will be executed before (and after) the load.
  5. Executes the count sql statement just built by the scripting task by using the table-name variables in the package.
  6. Executes the select max surrogate key sql stmt built by the scripting task using the table-name (or if this is a fact table, we select the max audit-key)… this is to capture with absolute certainty the starting point of the load
  7. This is the ONLY unique thing in this package.  EVERYTHING else is driven by the variables and dynamic SQL.
  8. Executes the count sql statement just built by the scripting task by using the table-name variables in the package.
  9. Update stats (this is for dimension loads only) the fact table stats/index’s are maintained by a nightly script, that is too large to run at load-time
  10. Update the audit row, set the end-time, and success-ind

Conclusion

There are many reasons for consistency… audit and maintainability is a major benefit.  The audit will produce a row in an audit table that you can tie ANY row in the warehouse (dimension or fact) back to the specific execution of a package.  The audit table will give you execution times for packages (which is very useful to monitor ongoing load-times and bottlenecks).   It may seem like a lot of work up-front (it really is not), and since all of the infrastructure is driven by dynamic sql, and variables, there is nothing (other than changing the values in variables) that needs to be created to maintain a consistent audit process.

2008 Resource Governor – Does it work?

Clip to Evernote

Yes….it works….

 

Introduction

I am not going to pretend to be the expert on this topic, so please feel free to comment.  I am running resource governor in production (in a Data Warehouse environment), and simply want to share some of my observations, tips that worked, and “words of wisdom”.  This post is not a detailed instruction on how to setup the resource governor, there is a GREAT doc “Using the Resource Governor” Sql Server Technical Article by Aaron Bertrand, and Boris Baryshnikov.

Things to note:

  1. Read the doc!  There are lots of practical hints, monitoring recommendations etc.  Don’t jump in before you understand it.
  2. Keep the proc simple – especially to start off with.  I over-engineered by first attempt at the classifier proc with lookup tables for the different groups.  My aim was to make the process dynamic, and by adding/removing things from the tables at different times I could control how the groups behaved.  Not a great result…. this proc is called A LOT… make sure what you build is well optimized.
  3. Create a failsafe “out” for your account.  The FIRST thing I check in the proc is a test for my user-name and assign myself to a group.  May be paranoid here, but, if this proc goes badly, you may not get a connection to be able to alter it easily…. the last thing you want is to have to shutdown a production instance because you are trying out code that just simply did not work….. I KNOW we test, but things happen that shouldn’t.
  4. Keep the group setup simple – even if you have grand plans, start simple.  Start with your worst offending application, create the group for it, and let everything else fall into another group.  This will basically get you default (system processes), BigApp group (for me that is SPSS),  User Group (everything else).
  5. Understand the utilization you currently have.  I am not being smart… you need to do some monitoring before you start.  What does your CPU/memory utilization look like before you start.  How much parallelism are your large queries getting (the ones you’re going to put into the BigApp group).  If you don’t know where you are starting from, it is hard to know if things are getting better (or worse).
  6. Understand your target, nothing precise, this is “art”.  For me, I had a few applications that needed “special” treatment to allow everything to live together and get consistent results. So I wanted to reduce the parallelism on SPSS so it would not take-over (for long periods of time) and reduce performance for everything else.
  7. Set reasonable limits.  Especially at first, and even after running, never cut your default pool too short, you will be in for a surprise (not the good kind) if something drops into default, and you have allocated all the available resources to the other pools.
  8. Keep it simple (now I am being smart, but really… start slow with this thing you will thank me later)
  9. Use the resource governor!  A healthy dose of caution is good, but, it really does work.

Understanding the pools

Assuming you have an idle system, and a query comes into a pool that is allocated a max of 50% of cpu, it will be allowed to use all 100%, the governor does not prevent “over allocation” to a pool if the resources are idle. 

There are times (untill the machine has been truly idle for a little while) that you will see cpu utilization right at your minimum cpu guarantee.  This caused me concern at first… I had set aside a guarantee of 15% for a adhoc_pool_high, and then saw a jump (to 15%) in cpu utilization.  I thought that the resource governor was suddenly using 15% cpu overhead, that is not the case.  The governor was holding onto 15%, and anything submitted to that pool was automatically given that 15%.  You will reach an idle cpu condition, and the 15% will drop off, but if you see it, don’t assume the worst…. it is likely working exactly as you intend.

 

Monitoring

This will be another topic for at least one (if not more) posts…. but…. because I like DMV’s so much, here are a few to go play with:

  • dm_exec_sessions  (memory/cpu usage) either individually for each session…. or by group, the group_id is here.
  • dm_exec_requests (waits and times) too much cxpacket you may want to lower DOP for the group.
  • dm_exec_cached_plans  (plans cached, for the pool)  freeproccache also takes a new argument to free cache for a given pool-id
  • dm_exec_query_memory_grants (bunches of memory request info) it even gives an ideal_memory_kb for the ideal amount of memory for a request.  Particularly if you don’t have lots of memory to play with, this will give you good tuning advice.
  • dm_exec_query_resource_semaphores  – are you getting the memory you specified in the config of the pool.
  • dm_os_memory_brokers – How is memory being utilized over time.
  • dm_os_memory_cache_entries – detailed info about cache usage buffer vs non buffer.

In addition to the DMV’s there are changes to perf-mon data that you can see at the pool level.  All of this goes back to my earlier point about understanding your system before you start playing with the resource governor.  You will be quickly overwhelmed by stats, and the more pools you setup, the more stats you will be overwhelmed with.

What I ended up with

This is one of those things that changes, gets tweaked over time… but right now here is where I am:

  1. adhoc_pool_high – min cpu 10% (I set this down to 0% via a job-step during load times), but during the day this guarantees the interactive report-user (from SSRS) available CPU.
    1. Report_user group – the common account we use from SSRS lands here
    2. special_user group – the “failsafe” group for my account
  2. adhoc_pool_med – This is where all of the “normal” query tools, and users connect
    1. user_interactive group – has the highest priority in the pool, and 50% of the memory, this is basically all the users that are not otherwise specified
    2. user_spss group – has the lowest priority in the pool, 25% of the memory, and reduced parallelism to 4.  These are LARGE long running queries, this help prevent them from taking over.
    3. user_tools group – has medium priority in the pool, and contains things like excel, office services, tableau.  Reduced parallelism to 7 for this group, not a big reduction, but allows for even execution among the tools.
  3. Default – Default pool …. things land here that are not classified
  4. internal – Internal engine pool…. anything running as SA, or the account the sql engine is running as lands here
  5. load_pool_high – pool for batch-load jobs. (controled by user-account and machine they are connecting from)
  6. maintenance_pool_low – pool with reduced CPU max, this is intended for maintenance tasks that can run for a long time, and reduce the impact on the overall performance of the system. 

I hope this helps you take the steps to begin working with the resource governor!  It will take some work to get everything playing together just the way you want, and there will always be some queries that break the mold, but it is well worth the efforts.

Help – My disk performance VI

Introduction
Help — My disk performance      (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help — My disk performance III  (Partitioning and compression)
Help — My disk performance IV (filegroups and files)  
Help — My disk performance V   (Monitoring)  

This post will wrap-up some additional monitoring topics, and a few other odds and ends leftover from the other posts in this series.

Jumping right in, there are a few quick-checks relative to IO utilization that you can see from the database side using DMV’s.

Two Quick-Check DMV’s
sys.dm_io_pending_io_requests — If you have a large quantity of pending IO, this is an early indicator that you will see other metrics above the allowable limits for you disk configuration.  This is by no means a guarantee that you have an issue, however, is a good (quick) sanity check.

sys.dm_io_virtual_file_stats — I started the discussion about hotspots and how files are filled by SQL Server back in part IV of this series.  This view provides lots of insight into how things are being allocated on a per-file basis.

This sql is intended as a starting point … not a final solution, your usage will depend largely on how many databases you want to look at, how to report the data, etc.  The intent here is to show some output from a very useful and versatile DMV.

1) Create a table to grab the results from when done
CREATE TABLE #tmp_file_io
   ( dbname        VARCHAR(256),
       type_desc     VARCHAR(50),
       file_name     VARCHAR(256),
       read_xfer_ms  INT,
       write_xfer_ms INT   )

2) setup some variables
 DECLARE @count     INT,
                     @num_files INT
SET @count = 1

3) How many files do we need to look at?
 SELECT @num_files = (SELECT Count(*)
                     FROM   sys.database_files)

4) Go collect the data
WHILE @COUNT <= @num_files
  BEGIN

      INSERT INTO #tmp_file_io
                  (dbname,
                   type_desc,
                   file_name,
                   read_xfer_ms,
                   write_xfer_ms)

      SELECT Db_name(6)    AS dbname, <–obvious hardcode here, if your looping over databases, etc
             b.type_desc,
             b.name,
             io_stall_read_ms / num_of_reads   AS ‘disk read xfer ms’,
             io_stall_write_ms / num_of_writes AS ‘disk write xver ms’
      FROM   sys.Dm_io_virtual_file_stats(6, @count) a
             INNER JOIN sys.database_files b
                ON @count = b.file_id

      SET @count = @count + 1
  END

As a result of this, you will have a table with the IO rates by file.  You will be happy that you followed some sort of naming convention when you created the files, because this will quickly point out issues within a filegroup caused by autogrowth of files, or uneven allocation to files.  You don’t want to have gone through all the efforts to balance IO across 8 files, and then, by way of uneven allocation, have all the IO concentrated on a single file.  Again, this query (by itself) does not tell you if you have a problem, you need to review this for trends as the data is collected from instance start.  If you know you have queries that regularly access multiple years of data, across a large cross section of data, you would expect to see reasonably even distribution of IO across all the files in the filegroup.

There is certainly more detail, more script and more to write on this topic, and at some point, I am sure that I will continue with this series.  For now, I am going to wrap this one up.  There are a few things that I mentioned as I went through this series, and will be posting on all of these in the not too distant future. 

The “outstanding” topics that I touched upon, but will be covered in the future include:
–Indepth look at compression
–Resource Governor – How to
–Automated index reorg/rebuild
–Query tuning and explain plans
–Memory configuration, and considerations
–SSIS… modular design, dimension loading, etc.

Help – My disk performance V

Introduction
This is the fifth post in a series on disk performance pulled it together quickly to be included in some blogging fun for “IO, IO, It’s off to disk we go!”  on T-SQL Tuesday #004
Help — My disk performance      (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help — My disk performance III  (Partitioning and compression)
Help — My disk performance IV (filegroups and files)  

We have covered a lot of territory in the previous 4 posts, hopefully you have an understanding of where you want to be.  In this post I start moving away from the physical layout towards monitoring.  I can’t say it strongly enough, you must monitor you servers, don’t believe someone is going to watch for you, and even if you have people with the responsibility of watching… you need to watch anyway.  Several reasons for this, first off, your responsibility is the Data Warehouse.  I am not speaking badly of other groups monitoring systems, but they have their hands FULL, and may very well have a different tolerance for unacceptable performance due to the number of systems they are responsible for and each systems unique needs. Secondly, each Data Warehouse has a “feel” of its own that you will begin to understand with consistent monitoring.  You will get to know your normal performance levels, and at a glance at a graph be able to know someone has a query that needs tuning, or an index opportunity.  The monitoring is where the “art” and “science” begin to intersect. (Query tuning… a topic for another time) is the other place where “art” and “science” cross paths.   

At the most basic, monitoring for disk-IO starts with server monitoring from the windows perf-mon tool.  Start a collection process on the server that writes to a new file every 24 hours, in this collection (at a minimum) collect the avg disk queue (for each drive), avg disk reads per sec (for each drive), avg disk writes per sec (for each drive), avg cpu (total is fine).  There are other measures that are worthwhile – sql memory, kernal memory and quite a few others…. but this topic we are looking a disk-IO… trying to stay on topic (and will talk about monitoring tools that go beyond perfmon shortly).  

Now that you’re collecting data, you also need to know when things are happening on your server.  At the highest level:  

  1. When does your index maintenance run, and for how long? (you have automated index reorganize/rebuilds based on percent fragmentation right?!?… also another topic for later)
  2. When does your backup run, and for how long? (are you using SQL Server 2008 backup compression? … you should)
  3. Do you have other maintenance that is regularly scheduled (you’re doing a weekly dbcc checkdb right?)
  4. When does your normal nightly data-load start, and how long does it normally run?
  5. Do you have scheduled reporting? (monthly reports, thing that run off-hours due to their size etc)
  6. Do you have reporting that is triggered by the completion of a load?
  7. Do you have resource group allocations scheduled to change based on time/load etc? (Are you using SQL Server 2008 resource governor? …also another topic for later, but you should)

Loading up a representative day (don’t start this exercise with a Saturday if that is your least utilized time, nor should you start by looking at Sunday if that is the day you have a checkdb running for 10 hours) into the performance monitor tool to review the results.  Ultimately, you need to understand all of the time periods, including maintenance windows, but for now, lets start with a normal representative day.  

Average disk queue
Assuming you have 40 drives for each of you data drives, then the simple answer is, your avg disk queue of 2 per drive should stay bellow 80 on each of your data drives.  Don’t worry too much at this point about a spike above 80… if you sustain a queue above 80 you will have performance issues.  

It is possible to have a disk-queue reported by perf-mon on the windows side, and the disks be idle, if you are seeing disk queues, you should get your SAN administrator involved to look at the disk statistics for the LUN in question on the SAN.  IF you have no disk pressure in the SAN, and the queue is high in Windows, you have HBA latency that needs to be addressed.  

HBA Latency 
 This is really unlikely – even with a dual-port 2gb card this is not likely the slow part,  with a dual-port 4gb card it is very very unlikely. Most likely if you are seeing latency on the HBA there is something that is set incorrectly, or needs a firmware upgrade.
If you are actually seeing latency:
–Validate your port speed setting for both the card and the SAN switch (I have seen a switch port set to 100mb, you WILL have a performance impact)
–Validate your HBA firmware versions and compatibility with SAN OS versions, your SAN vendor will have very specific recommendations.
–Validate the SAN connections to the switches

Average reads/writes
Assuming you have 40 drives for each of your data drives, the simple answer is, the reads/writes should stay below 180-200 per drive.  This allows for 7200 – 8000 IO’s per second on each drive.  Again, don’t worry about spikes above this level unless they are sustain, or very close together. 

Average CPU
This has less to do with disk-IO, and more to do with overall performance measures.  An important thing to understand, and provides a large amount of insight into the overall responsiveness of your server.  Get to know what your normal CPU profile looks like in the different points of time I mentioned above. 

Examples
This chart shows a 9 drive array that has issues.  A little hard to read here, but the queues (blue) are WELL above 18 and the avg read/write activity is WELL above the 1600 – 2000 level.

This chart shows a 40 drive array that is performing within acceptable tolerances.  These queues show some spikes in activity, still remaining bellow the allowable 80 for this configuration, and going back to an earlier point about knowing what happens when…. those are all during maintenance and backup windows.

Help – My disk performance III

Introduction
This is the third post in a series on disk performance
Help — My disk performance       (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)

 Now that we have covered a large portion of the physical disk (leaving aside physically aligned partitions for a moment), I am going to jump over the next layer in the stack which would be filegroups and files to talk about partitioning  and compression first.  The logic of the filegroups and files will be easier to cover after this discussion.

Sticking with the basic layout that we arrived at earlier with two data drives, and understanding that we have a range partition scheme available to us (I wish we could subpartition as in Oracle… maybe someday), but given the range you need to decide what is a logical split.  The split needs to be logical to you… as well as your users… partition elimination in queries is a powerful thing (throw in compression, and start scanning all partitions because you picked something not logical to your users it is equally powerful in a bad way).  Making a general statement, in a data warehouse you can generally find a way that annual partitions make sense to everyone.  It might be calendar year, it might be fiscal year, it might be a sales year ending  in August… but generally every business will have some sort of logical annual cycle.  Given the fact that your building a warehouse, your likely interested in looking at comparing data between last year, and how performance is this year…. multi year trends etc.

Jumping right in, to create a partitioned table, you need to have the partition function, which defines your ranges, and a partition scheme which creates the connection between the file-group and the partition values that are to go into that range.  For your own sake, come up with a naming convention that makes sense.  This seems like an obvious point, but if you have an annual range for 2005 data and the filegroup it is associated with is named database_name_filegroup_6, (which then has files associated) you will (look at my picture) pull out your hair.  Once we get to filegroups, number of files etc I will go over this in more detail.  For now, it is sufficient to say that there are file-groups for each of the years, and each filegroup is associated with a given range.

CREATE PARTITION FUNCTION [Annual_Partition_PF1](int) AS RANGE LEFT FOR VALUES (2005, 2006, 2007, 2008, 2009, 2010);

CREATE PARTITION SCHEME [Annual_Partition_PS1] AS PARTITION [Annual_Partition_PF1] TO ([data_2005], [data_2006], [data_2007], [data_2008], [data_2009], [data_2010], [data_3]);

 

Tying this partitioning discussion back to the disks that we talked about previously, each partition (by way of file-group) is tied to a drive letter, which is one of the two data luns from the SAN.

Thus, if you are requesting data to compare 2010 and 2009 data you are accessing data (in parallel) from two different drives, which are connected to two different storage processors, which are connected to 2 different LUNS…load balanced across two switches connected to your dual-port HBA. I hope you’re starting to see this relatively clearly.

  

   

Compression: I  (at some point) I will go into greater depth on this topic alone, but for our purposes here, keeping it in the world of disk IO performance… proceed (with measured caution) but please do yourself the favor of working through the process of getting things compressed.  There are some gotchas that can tank your query performance, but the upside is HUGE.   Imagine each read of an 8k page brings back 16k of data, now we are getting disk IO performance!

 ALTER TABLE [responder] REBUILD PARTITION = 5 WITH(DATA_COMPRESSION = PAGE )

  

  

The space savings are obvious, and as we discussed with RAID-10 configuration, every MB of actual storage requires 2 MB of physical disk to allow for mirroring. If you can save 300Gb you have realized quite a substantial savings in disk.

The downside… DO NOT (at least not with regularity) update/delete/insert to the compressed partitions.  You WILL pay a penalty large enough that you will notice.  (I have seen cases on large tables with compressed index’s, etc, where the compression can cause a 4x slower response for update/inserts/deletes).

Remembering that we are talking about a warehouse layout, it is relatively safe to compress the data that is a few years old.  IF on the outside chance there is a historical update to the data it will run longer… but the space savings, and read rates are well worth the tradeoff.

Help – My disk performance II

 

 

OK, Post number two in two days on a rather weighty topic… don’t expect this forever! I am prepping for a presentation, and using this to collect my thoughts, as well as provide a place to point to during the presentation… there is no way I can get to all this detail in a presentation, so my hope is to provide a place for some self-paced study of my rambling thoughts.

Introduction
This is the second post in a series on disk performance
Help — My disk performance   (RAID how/why / what to put where)

 Now that we have covered some of the basics about disk, raid configurations, number and size of drives, today’s discussion will move up the stack (just a little) to the connections, and layout of those disks.  This discussion is more theoretical than specific to a particular SAN vendor, the logic applies regardless of the hardware provider, be it EMC, Hitachi, HP, etc.

Some basic assumptions (Your SAN admin folks are going to have done a good job protecting you at the san/network levels with redundancy.)

  1. SAN fabric has a minimum of two switches (This provides failover and redundancy in the network layer) should a switch fail, someone unplug something, break a cable, etc, the two switches are independent and can provide complete access to the SAN.
  2. There are a minimum of two storage processors internally in the SAN.  This provides load-balancing internally, as well as failover internally within the SAN should a processor fail.

I have seen quite good performance from a single dual-port card connected at 2gbps. My recommendation is to connect at 4gbps to effectively eliminate discussions about HBA latency (more on this another time).  A single dual-port card does little for redundancy and failover on the server, but it does provide load-balancing across the two ports, and does provide for connections to both halves of the SAN, which does have redundancy in the fabric as well as internally in the SAN.

 

As you can see in the illustration, the dual port card has a connection to each switch and each switch is connected to a storage processor.  Working with the setup of a minimum of two data drives, each drive is a 1tb lun spanning 40 physical drives(Previous post).  Each LUN gets assigned a primary storage processor, given two drives, and two storage processors (each with their own read-ahead and buffer caches) each of the two luns should be split between the storage processors.

Monitoring, is topic for another day, but here are some basic numbers to live by (if they don’t make sense… stay tuned, I promise I will get there)

  • IOPS, when calculating IO, use 180 – 200 per drive in the LUN
  • Access time 5ms – 8ms (as measured on the SAN)
  • Disk queue 2 per drive in the LUN
  • 1tb per LUN

The filegroups, numbers of files, file sizes are all going to be part of a later post.

Follow

Get every new post delivered to your Inbox.

Join 373 other followers