T-SQL Tuesday #009 – Beach Time

I almost missed this #Tsq2sday post, as I was on vacation last week, and am just catching up and digging out from my post-vacation pile. So this month’s topic “T-SQL Tuesday #009: Beach Time” is actually very timely! Honestly though, working in a small shop I need to bend the “turn off technology” rule just a little…. But really just a little….

I am actually able to get away. Based on the fact that I am digging out from the post-vacation pile you be the judge if I am successful at getting away without creating problems for myself later… but given the fact that I am taking a time-out from all that to write this post, I think I have managed.

There are several things that come to mind…. Things need to work while you are out, the company needs to function (you still need your job to pay for vacation when you get back) thus my need to bend the rule about no technology… if something really bad happens, somehow, some way, I still need to be accessible, and be able to react (maybe not as quick as if I am at work… but no contact is not generally great). For this I have my cell-phone, as well as a cellular wireless card for my netbook. In general, I have alerts, and monitors that call my phone (I want to know there are issues before someone calls me… vacation or not). If something bad happens to a server or database, I will know it…. if it comes back up clean, I will know it…. As for phone-calls, I will accept calls from my boss. He knows I am on vacation, and respects the need to get away, Other than that, I will not answer. As for the netbook, I try to check on my email once a day (maybe two days), I find it much more relaxing to come back to the office, and have all the system messages, etc already cleared from my inbox, and have some idea of the issues that are hot that need my immediate attention. Showing back up to the office with a hundreds of messages, and then having to sort to find the issues is far worse (in my view) than staying on top of it in the first place. Out of office is set…. I do not respond to meeting requests, questions, etc I am on vacation after all. If I have a message from my boss (the one that respects my need to get away), I will respond… the business needs to run, and if a simple email response can keep things on-track in my absence then OK.

So I guess that gets me to my personal rules.

  1. Set out of office notices on the phone, and email.
  2. Don’t answer calls
    1. Do check messages on the cell phone
    2. Do return a call if it really is necessary
  3. Don’t answer emails
    1. Meeting requests… they can wait (people don’t need to know your checking email)
    2. Questions…. They can wait (unless really necessary… in which case I provide enough info for my boss to answer the question)
  4. Turn OFF tweetdeck … I love you guys…. But when get away, I need to disconnect, and get away….
  5. Don’t tell people you are checking email, and can be reached… the person that needs to know… knows… everyone else, you are on vacation.

Now for the before I go…

I am a Data Warehouse Administrator…. So there are some DBA tasks, and some Load tasks etc.

  1. Space seems to be one of those battles I am always fighting…. Make sure there is enough!! The LAST thing you want is full data files while you are out, and load that don’t work for a perfectly preventable reason. Yes I have monitors for these sorts of things, by why mess with that while you are out.
  2. Ensure the index maintenance jobs are caught-up, and things are working the way they should from a performance perspective…. If something unfortunate should happen, and you miss index maintenance for a week… you will most likely be OK (assuming your index’s are in good shape before you go).
  3. Make sure backups are complete… and up-to-date. Nobody wants to miss backups, but at least be sure you are in good shape before you go.
  4. If there are manual daily tasks that must be performed, make sure the person who is going to be doing them for you know exactly what to do… and what to do with the results of those tasks.
  5. If at all possible… 4 days before vacation…. NOTHING new goes into production. You want to have some assurance that loads are stable, performance is working… the only thing worse than a change causing problems, is that last minute “make things better” change that shoots you in your own foot.

Hopefully you too can get time away from the office…. get the much needed re-charge you need, and still have a stable working environment upon your return!

My Favorite New(ish) 2008 Feature (Resource Governor)

#TSQLTuesday

I have written in the past about setting up the resource governor, which I am counting as my favorite new(ish) feature, so I am taking this TSQL2sday post opportunity do my promised followup for monitoring of the resource governor.

Introduction

To start with, my advice about the resource governor is to start small.  Have some idea of what you want when you are done… do you need three pools, five pools… how may groups do you envision etc.  Once you “think” you know where you want to go, start small.  Start with a basic A/B split (user interactive Vs Batch process for example) and work your way up from there, monitoring the whole way.  Well balanced, the resource governor works very well… all your resources being consumed in a single pool gets you nowhere (and depending on your max settings can set you back quite a bit)…. so to achieve the “well balanced” lets look at monitoring.

The primary method for monitoring the internals of the resource governor is through the Performance Monitor statistics made available.

Through these metrics you will be able to see both the pool and group level information in much the same way that you would normally see sql server performance statistics, however they are now broken down at a lower level (so you can see how your allocations are performing).  Keep in mind an important point about resource governor, if you allocate 10% of the CPU as a max to a pool, that pool can still get 100% of the CPU if nothing else is running on the machine.  The resource governor ONLY “kicks in”, holds back resources, splits resources among pools (however you wish to think of it) if there is competition between pools.  This makes it important to determine how many, and which resources you are going to assign to the pools.  Pool Stats:

I have setup

  • Default – PLEASE don’t get rid of the default pool… I had discussion of this in my prior resource governor post.
  • Internal – Internal DB processes run here
  • adhoc_pool_high – This is where I have my “normal” interactive user queries, reports.
  • adhoc_pool_mde – This is where I have the “High Utilization” tools such as SPSS.  There are tools that will select large amounts of data for mining or other analysis where runtime is not measured in seconds, and can be separated to prevent those very large queries from dominating the machine while still running in reasonable times.
  • load_pool_high – This is where I have batch processes.  There is a scheduled job that changes priorities for this pool to a higher-level overnight for my batch-load processes.
  • maintenance_pool_low – This is where I run long-running maintenance processes that I just need to get done, but don’t care how long it takes… compressing historical data etc.  Things I want to keep out of the way of the rest of the system.

As you can see in the screenshot there is little else running right at this moment, other than the adhoc_pool_high, so in reality that pool has access to 100% of the cpu until there is something that shows up in one of the other pools that needs resources.

Workgroup setups

Assigned to each of the pools contains one or more workgroups, in this case you can see that the 74% of the adhoc_pool_high is all being used by the “special_user” workgroup.  This is where starting simple will help you greatly, there is no association (other than your knowledge… or looking in the database) between workgroups and pools that you can see in performance monitor.

My associations are

Default pool
   Default workgroup
Internal pool
   internal workgroup
Adhoc_pool_high
   Special_user (specific ID’s that I wish to GUARANTEE performance to)
   dashboard_user (Interactive BI dashboard that NEEDS to perform well)
   User Interactive (interactive reports, management studio, etc_
Adhoc_pool_med
   report_user (scheduled reports)
   user_tools (General bucket for tools other than interactive queries)
   user_spss (SPSS gets it own resource allocation to prevent it from overrunning the machine)
Load_Pool_High
   load_batch
Maintenance_Pool_low
   maintenance

It is at this level where you will be monitoring the resource allocations to see that they are meeting your expectations.  Resource governor is really where art and science meet, you will need to determine which workgroups/pools you wish to give priority to, and which ones will take the back-seat.  Once things are running that cross between workgroups that are assigned to different pools you will see the cpu control % numbers take effect.

In addition to the performance monitor stat, which I have personally found most useful for immediate feedback on the performance of the pools there are a few DMV’s that collect information along the way.  It is through these DMV’s that you can understand the associations between the workload group and the associated pool, as well as what your configurations are.

dm_resource_governor_workload_groups  Where you can see what your allocations/configurations are.
dm_resource_governor_resource_pools Where you can see what your allocations/configurations are.
dm_resource_governor_resource_configuration Is it enabled?

T-SQL Tuesday #006: Configure Filestream Storage for BLOBS

Introduction

This is my post for the monthly TSQL Tuesday Blogging event.  With the release of SQL Server 2008, among the many new features was one that at first I took little notice of, FILESTREAM storage options. It did not take long to realize that this new feature has some powerful storage capabilities. You can store, (very simply at that) all of that unstructured data (Images, video, word docs, PDF’s… you name it). This post will give you an overview of how to enable the features necessary to be able to take advantage of this storage option under SQL Server 2008. (Note, to enable this option you need admin rights server or sys admin rights) Also, since this is using NTFS the filegroup that you configure to use filestream storage cannot be a compressed (at the windows OS level) volume. (Right Click the C:\data folder -> Properties -> Advanced Attributes -> Uncheck “Compress contents to save disk space”). PLEASE don’t put this directory on your OS drive… this should go onto a data drive. I configured mine to use a directory on the same drive that I have system databases, and the SSAS data files. Space needs will obviously be determined by what you store, but it should be treated like other data files, and NOT be located on your OS drive.

For the sake of “completeness” you can also enable by using management studio (or at install for that matter), I am describing the 2 options I use most often… tsql, and configuration manager.

TSQL

For those of you who don’t like to script things, or don’t like the ease of clicking check-box’s. (actually any of these really are easy)…. File_stream_access_level is an advanced option, so first you need to show advanced, then you can turn it on.

USE master
Go

EXEC sp_configure ‘show advanced options’
GO

EXEC sp_configure filestream_access_level, 1
GO

RECONFIGURE WITH OVERRIDE
GO

 

Filestream access levels:

  • Disabled = 0 – FILESTREAM support for the instance is Disabled
  • Enabled = 1 – FILESTREAM for Transact-SQL Access is Enabled
  • Enabled = 2 — FILESTREAM for Transact-SQL and Windows (32) streaming access is Enabled

Configuration Manager

I like this option the best… seems to me “configuration manager” is the logical place to manage the configuration of the database service, so (for me at least) this is the first place I go look. And the point-click thing is also not a negative in my mind… save me a few trips to MSDN to find the options.

Pick the instance (in this case default), then right-click select and select properties

Enable the check-boxes:

  • The first check-box turns it on…
  • The second is the name of the windows share name if you enable file-io streaming access (I left it at the default…. Please name this something more intelligent, you don’t want “mssqlserver” on each of your servers, you will get things confused if you have more than 1 server).
  • The third is if you are going to allow remote clients to access the file-stream data.
  • Service needs a restart

You have it turned on, take it for a drive ….

Almost there!

Now that the instance is configured to allow for the storage, you need to create a file-group to store the file-stream data in the database you wish to use.

ALTER DATABASE demo
ADD FILEGROUP filestream_data CONTAINS FILESTREAM
GO

 

Now that you have a filegroup, add a file. The directory cannot exist, it will be built/managed by the instance. In My case I had built J:\SQLData\demo\ before I executed this command.

ALTER DATABASE demo
ADD FILE
(
NAME= ‘demo_filestream_data_1′,
FILENAME = ‘J:\SQLData\demo\filestream’
)
TO FILEGROUP filestream_data
GO

 

Now you need a table to hold the pointers to the filestream data being stored on the server (note the filestream syntax)
I am using a guid for the ID, you can use a identity if you like.

CREATE TABLE dbo.BLOB_EXAMPLE (
I
D UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
BLOB VARBINARY(MAX) FILESTREAM NULL
)

 

Now you insert into the table, I am using some text… you can put whatever you want into the varbinary…. Files images etc

DECLARE @ID UNIQUEIDENTIFIER

SET @ID = NEWID()

 INSERT INTO dbo.BLOB_EXAMPLE
(ID, BLOB)
VALUES
(@ID, CAST(‘tsql Tuesday BLOB example’ AS VARBINARY(MAX)))

 SELECT ID, BLOB
FROM dbo.BLOB_EXAMPLE
WHERE ID = @ID

The ID: BF9A3D53-CC91-410E-9606-E2A43E36312A

 SELECT BLOB.PathName()
F
ROM dbo.BLOB_EXAMPLE
WHERE ID = @ID

 

\\SERVER_NAME\MSSQLSERVER\v1\Demo\dbo\BLOB_EXAMPLE\BLOB\BF9A3D53-CC91-410E-9606-E2A43E36312A

There are several parts to the path you get back

  • “SERVER_NAME” (where is this path…. UNC PATH)
  • “MSSQLSERVER” (the share-name you created near the top of this post)
  • “\v1\Demo\BLOB_EXAMPLE\BLOB” (database, table, field info automatically created for you)
  • “BF9A3D53-CC91-410E-9606-E2A43E36312A” the ID that you can use to reference the BLOB to retrieve the data.

Conclusion

I hope this post is a starting point for you to setup some great SQL Server 2008 functionality. It really is easy, and will allow storage of what is a growing, and ever present world of unstructured data in your databases.

T-SQL Tuesday #005: Cleanup Your Notification Emails With HTML Tables

Introduction

I must admit this post is more along the lines of the “snack” series that (Andy Leonoard|Blog|Twitter) posts, but short-and to the point has some value!  I am writing this one to weigh in on this month’s #TSQL2sday topic about reporting.  So to blatantly steal the “snack” term from Andy, here it is:

TSQL Snack:

As a result of your load process, you send an email to your users to notify them that new data is available….right?  Well you should.  You can send a simple text email saying things are done, which certainly works, or, you can add some value and interest by providing some counts of what was loaded.  (You are auditing your loads…. that is another topic)  Your audit table already has everything you need, and a simple HTML table will get you professional looking output.

DECLARE @tableHTML  NVARCHAR(MAX) ;

SET @tableHTML =

    N'<H1>Warehouse Load Completion</H1>' +

    N'<table border="1">' +

    N'<tr><th>End Date</th><th>Table</th>' +

    N'<th>Rows Loaded</th>' +

    CAST ( ( SELECT td = CONVERT(VARCHAR,exec_start_dt, 110),       '',

                    td = tbl_name, '',

                    td = ISNULL(final_row_cnt - initial_row_cnt, 0)

              FROM dw_dbo.ld_audit

              WHERE audit_key >= (SELECT MAX(audit_key)

                                  FROM dw_dbo.ld_audit

                                  WHERE tbl_name = 'Master')

              AND tbl_name NOT IN ('Step','Datamart')

              ORDER BY (final_row_cnt - initial_row_cnt) DESC, audit_key

              FOR XML PATH('tr'), TYPE

    ) AS NVARCHAR(MAX) ) +

    N'</table>' ;

EXEC msdb.dbo.sp_send_dbmail @profile_name = 'General Notifications',

                             @recipients = 'Group-Distribution-List@your-domain.com; individual@your-domain.com; oncall-email@your-domain.com;',

                             @subject = 'Data Warehouse Load Complete',

                             @body = @tableHTML,

              @body_format = 'HTML';
				

  This example reads all of the audit rows, calculates the number of rows loaded for everything that was loaded since the start of the last “master” (which is my driver) audit row.  The output is put into a HTML formatted table (does not matter now many rows come back), gets a header-title, and column titles, and is sent to a set of distribution lists and email names via dbmail. What you will get is something like this (obscured my production names):

Conclusion

Communication is a good thing!  You should be letting your users know when loads are complete, what kind of data has been loaded, and what to expect.  Since SQL Server 2005/2008 make the HTML tables so simple, there is no reason that those notifications should look “simple”.

Because it’s TSQL2sDay

I know this is not what my blog is about… but a poll seems like the thing to do for today’s topic.

 

Help – My disk performance V

Introduction
This is the fifth post in a series on disk performance pulled it together quickly to be included in some blogging fun for “IO, IO, It’s off to disk we go!”  on T-SQL Tuesday #004
Help — My disk performance      (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help — My disk performance III  (Partitioning and compression)
Help — My disk performance IV (filegroups and files)  

We have covered a lot of territory in the previous 4 posts, hopefully you have an understanding of where you want to be.  In this post I start moving away from the physical layout towards monitoring.  I can’t say it strongly enough, you must monitor you servers, don’t believe someone is going to watch for you, and even if you have people with the responsibility of watching… you need to watch anyway.  Several reasons for this, first off, your responsibility is the Data Warehouse.  I am not speaking badly of other groups monitoring systems, but they have their hands FULL, and may very well have a different tolerance for unacceptable performance due to the number of systems they are responsible for and each systems unique needs. Secondly, each Data Warehouse has a “feel” of its own that you will begin to understand with consistent monitoring.  You will get to know your normal performance levels, and at a glance at a graph be able to know someone has a query that needs tuning, or an index opportunity.  The monitoring is where the “art” and “science” begin to intersect. (Query tuning… a topic for another time) is the other place where “art” and “science” cross paths.   

At the most basic, monitoring for disk-IO starts with server monitoring from the windows perf-mon tool.  Start a collection process on the server that writes to a new file every 24 hours, in this collection (at a minimum) collect the avg disk queue (for each drive), avg disk reads per sec (for each drive), avg disk writes per sec (for each drive), avg cpu (total is fine).  There are other measures that are worthwhile – sql memory, kernal memory and quite a few others…. but this topic we are looking a disk-IO… trying to stay on topic (and will talk about monitoring tools that go beyond perfmon shortly).  

Now that you’re collecting data, you also need to know when things are happening on your server.  At the highest level:  

  1. When does your index maintenance run, and for how long? (you have automated index reorganize/rebuilds based on percent fragmentation right?!?… also another topic for later)
  2. When does your backup run, and for how long? (are you using SQL Server 2008 backup compression? … you should)
  3. Do you have other maintenance that is regularly scheduled (you’re doing a weekly dbcc checkdb right?)
  4. When does your normal nightly data-load start, and how long does it normally run?
  5. Do you have scheduled reporting? (monthly reports, thing that run off-hours due to their size etc)
  6. Do you have reporting that is triggered by the completion of a load?
  7. Do you have resource group allocations scheduled to change based on time/load etc? (Are you using SQL Server 2008 resource governor? …also another topic for later, but you should)

Loading up a representative day (don’t start this exercise with a Saturday if that is your least utilized time, nor should you start by looking at Sunday if that is the day you have a checkdb running for 10 hours) into the performance monitor tool to review the results.  Ultimately, you need to understand all of the time periods, including maintenance windows, but for now, lets start with a normal representative day.  

Average disk queue
Assuming you have 40 drives for each of you data drives, then the simple answer is, your avg disk queue of 2 per drive should stay bellow 80 on each of your data drives.  Don’t worry too much at this point about a spike above 80… if you sustain a queue above 80 you will have performance issues.  

It is possible to have a disk-queue reported by perf-mon on the windows side, and the disks be idle, if you are seeing disk queues, you should get your SAN administrator involved to look at the disk statistics for the LUN in question on the SAN.  IF you have no disk pressure in the SAN, and the queue is high in Windows, you have HBA latency that needs to be addressed.  

HBA Latency 
 This is really unlikely – even with a dual-port 2gb card this is not likely the slow part,  with a dual-port 4gb card it is very very unlikely. Most likely if you are seeing latency on the HBA there is something that is set incorrectly, or needs a firmware upgrade.
If you are actually seeing latency:
–Validate your port speed setting for both the card and the SAN switch (I have seen a switch port set to 100mb, you WILL have a performance impact)
–Validate your HBA firmware versions and compatibility with SAN OS versions, your SAN vendor will have very specific recommendations.
–Validate the SAN connections to the switches

Average reads/writes
Assuming you have 40 drives for each of your data drives, the simple answer is, the reads/writes should stay below 180-200 per drive.  This allows for 7200 – 8000 IO’s per second on each drive.  Again, don’t worry about spikes above this level unless they are sustain, or very close together. 

Average CPU
This has less to do with disk-IO, and more to do with overall performance measures.  An important thing to understand, and provides a large amount of insight into the overall responsiveness of your server.  Get to know what your normal CPU profile looks like in the different points of time I mentioned above. 

Examples
This chart shows a 9 drive array that has issues.  A little hard to read here, but the queues (blue) are WELL above 18 and the avg read/write activity is WELL above the 1600 – 2000 level.

This chart shows a 40 drive array that is performing within acceptable tolerances.  These queues show some spikes in activity, still remaining bellow the allowable 80 for this configuration, and going back to an earlier point about knowing what happens when…. those are all during maintenance and backup windows.

Follow

Get every new post delivered to your Inbox.