My Favorite New(ish) 2008 Feature (Resource Governor)

#TSQLTuesday

I have written in the past about setting up the resource governor, which I am counting as my favorite new(ish) feature, so I am taking this TSQL2sday post opportunity do my promised followup for monitoring of the resource governor.

Introduction

To start with, my advice about the resource governor is to start small.  Have some idea of what you want when you are done… do you need three pools, five pools… how may groups do you envision etc.  Once you “think” you know where you want to go, start small.  Start with a basic A/B split (user interactive Vs Batch process for example) and work your way up from there, monitoring the whole way.  Well balanced, the resource governor works very well… all your resources being consumed in a single pool gets you nowhere (and depending on your max settings can set you back quite a bit)…. so to achieve the “well balanced” lets look at monitoring.

The primary method for monitoring the internals of the resource governor is through the Performance Monitor statistics made available.

Through these metrics you will be able to see both the pool and group level information in much the same way that you would normally see sql server performance statistics, however they are now broken down at a lower level (so you can see how your allocations are performing).  Keep in mind an important point about resource governor, if you allocate 10% of the CPU as a max to a pool, that pool can still get 100% of the CPU if nothing else is running on the machine.  The resource governor ONLY “kicks in”, holds back resources, splits resources among pools (however you wish to think of it) if there is competition between pools.  This makes it important to determine how many, and which resources you are going to assign to the pools.  Pool Stats:

I have setup

  • Default – PLEASE don’t get rid of the default pool… I had discussion of this in my prior resource governor post.
  • Internal – Internal DB processes run here
  • adhoc_pool_high – This is where I have my “normal” interactive user queries, reports.
  • adhoc_pool_mde – This is where I have the “High Utilization” tools such as SPSS.  There are tools that will select large amounts of data for mining or other analysis where runtime is not measured in seconds, and can be separated to prevent those very large queries from dominating the machine while still running in reasonable times.
  • load_pool_high – This is where I have batch processes.  There is a scheduled job that changes priorities for this pool to a higher-level overnight for my batch-load processes.
  • maintenance_pool_low – This is where I run long-running maintenance processes that I just need to get done, but don’t care how long it takes… compressing historical data etc.  Things I want to keep out of the way of the rest of the system.

As you can see in the screenshot there is little else running right at this moment, other than the adhoc_pool_high, so in reality that pool has access to 100% of the cpu until there is something that shows up in one of the other pools that needs resources.

Workgroup setups

Assigned to each of the pools contains one or more workgroups, in this case you can see that the 74% of the adhoc_pool_high is all being used by the “special_user” workgroup.  This is where starting simple will help you greatly, there is no association (other than your knowledge… or looking in the database) between workgroups and pools that you can see in performance monitor.

My associations are

Default pool
   Default workgroup
Internal pool
   internal workgroup
Adhoc_pool_high
   Special_user (specific ID’s that I wish to GUARANTEE performance to)
   dashboard_user (Interactive BI dashboard that NEEDS to perform well)
   User Interactive (interactive reports, management studio, etc_
Adhoc_pool_med
   report_user (scheduled reports)
   user_tools (General bucket for tools other than interactive queries)
   user_spss (SPSS gets it own resource allocation to prevent it from overrunning the machine)
Load_Pool_High
   load_batch
Maintenance_Pool_low
   maintenance

It is at this level where you will be monitoring the resource allocations to see that they are meeting your expectations.  Resource governor is really where art and science meet, you will need to determine which workgroups/pools you wish to give priority to, and which ones will take the back-seat.  Once things are running that cross between workgroups that are assigned to different pools you will see the cpu control % numbers take effect.

In addition to the performance monitor stat, which I have personally found most useful for immediate feedback on the performance of the pools there are a few DMV’s that collect information along the way.  It is through these DMV’s that you can understand the associations between the workload group and the associated pool, as well as what your configurations are.

dm_resource_governor_workload_groups  Where you can see what your allocations/configurations are.
dm_resource_governor_resource_pools Where you can see what your allocations/configurations are.
dm_resource_governor_resource_configuration Is it enabled?

2008 Resource Governor – Does it work?

Clip to Evernote

Yes….it works….

 

Introduction

I am not going to pretend to be the expert on this topic, so please feel free to comment.  I am running resource governor in production (in a Data Warehouse environment), and simply want to share some of my observations, tips that worked, and “words of wisdom”.  This post is not a detailed instruction on how to setup the resource governor, there is a GREAT doc “Using the Resource Governor” Sql Server Technical Article by Aaron Bertrand, and Boris Baryshnikov.

Things to note:

  1. Read the doc!  There are lots of practical hints, monitoring recommendations etc.  Don’t jump in before you understand it.
  2. Keep the proc simple – especially to start off with.  I over-engineered by first attempt at the classifier proc with lookup tables for the different groups.  My aim was to make the process dynamic, and by adding/removing things from the tables at different times I could control how the groups behaved.  Not a great result…. this proc is called A LOT… make sure what you build is well optimized.
  3. Create a failsafe “out” for your account.  The FIRST thing I check in the proc is a test for my user-name and assign myself to a group.  May be paranoid here, but, if this proc goes badly, you may not get a connection to be able to alter it easily…. the last thing you want is to have to shutdown a production instance because you are trying out code that just simply did not work….. I KNOW we test, but things happen that shouldn’t.
  4. Keep the group setup simple – even if you have grand plans, start simple.  Start with your worst offending application, create the group for it, and let everything else fall into another group.  This will basically get you default (system processes), BigApp group (for me that is SPSS),  User Group (everything else).
  5. Understand the utilization you currently have.  I am not being smart… you need to do some monitoring before you start.  What does your CPU/memory utilization look like before you start.  How much parallelism are your large queries getting (the ones you’re going to put into the BigApp group).  If you don’t know where you are starting from, it is hard to know if things are getting better (or worse).
  6. Understand your target, nothing precise, this is “art”.  For me, I had a few applications that needed “special” treatment to allow everything to live together and get consistent results. So I wanted to reduce the parallelism on SPSS so it would not take-over (for long periods of time) and reduce performance for everything else.
  7. Set reasonable limits.  Especially at first, and even after running, never cut your default pool too short, you will be in for a surprise (not the good kind) if something drops into default, and you have allocated all the available resources to the other pools.
  8. Keep it simple (now I am being smart, but really… start slow with this thing you will thank me later)
  9. Use the resource governor!  A healthy dose of caution is good, but, it really does work.

Understanding the pools

Assuming you have an idle system, and a query comes into a pool that is allocated a max of 50% of cpu, it will be allowed to use all 100%, the governor does not prevent “over allocation” to a pool if the resources are idle. 

There are times (untill the machine has been truly idle for a little while) that you will see cpu utilization right at your minimum cpu guarantee.  This caused me concern at first… I had set aside a guarantee of 15% for a adhoc_pool_high, and then saw a jump (to 15%) in cpu utilization.  I thought that the resource governor was suddenly using 15% cpu overhead, that is not the case.  The governor was holding onto 15%, and anything submitted to that pool was automatically given that 15%.  You will reach an idle cpu condition, and the 15% will drop off, but if you see it, don’t assume the worst…. it is likely working exactly as you intend.

 

Monitoring

This will be another topic for at least one (if not more) posts…. but…. because I like DMV’s so much, here are a few to go play with:

  • dm_exec_sessions  (memory/cpu usage) either individually for each session…. or by group, the group_id is here.
  • dm_exec_requests (waits and times) too much cxpacket you may want to lower DOP for the group.
  • dm_exec_cached_plans  (plans cached, for the pool)  freeproccache also takes a new argument to free cache for a given pool-id
  • dm_exec_query_memory_grants (bunches of memory request info) it even gives an ideal_memory_kb for the ideal amount of memory for a request.  Particularly if you don’t have lots of memory to play with, this will give you good tuning advice.
  • dm_exec_query_resource_semaphores  – are you getting the memory you specified in the config of the pool.
  • dm_os_memory_brokers – How is memory being utilized over time.
  • dm_os_memory_cache_entries – detailed info about cache usage buffer vs non buffer.

In addition to the DMV’s there are changes to perf-mon data that you can see at the pool level.  All of this goes back to my earlier point about understanding your system before you start playing with the resource governor.  You will be quickly overwhelmed by stats, and the more pools you setup, the more stats you will be overwhelmed with.

What I ended up with

This is one of those things that changes, gets tweaked over time… but right now here is where I am:

  1. adhoc_pool_high – min cpu 10% (I set this down to 0% via a job-step during load times), but during the day this guarantees the interactive report-user (from SSRS) available CPU.
    1. Report_user group – the common account we use from SSRS lands here
    2. special_user group – the “failsafe” group for my account
  2. adhoc_pool_med – This is where all of the “normal” query tools, and users connect
    1. user_interactive group – has the highest priority in the pool, and 50% of the memory, this is basically all the users that are not otherwise specified
    2. user_spss group – has the lowest priority in the pool, 25% of the memory, and reduced parallelism to 4.  These are LARGE long running queries, this help prevent them from taking over.
    3. user_tools group – has medium priority in the pool, and contains things like excel, office services, tableau.  Reduced parallelism to 7 for this group, not a big reduction, but allows for even execution among the tools.
  3. Default – Default pool …. things land here that are not classified
  4. internal – Internal engine pool…. anything running as SA, or the account the sql engine is running as lands here
  5. load_pool_high – pool for batch-load jobs. (controled by user-account and machine they are connecting from)
  6. maintenance_pool_low – pool with reduced CPU max, this is intended for maintenance tasks that can run for a long time, and reduce the impact on the overall performance of the system. 

I hope this helps you take the steps to begin working with the resource governor!  It will take some work to get everything playing together just the way you want, and there will always be some queries that break the mold, but it is well worth the efforts.

Follow

Get every new post delivered to your Inbox.

Join 386 other followers