Help — My disk performance IV

This is the fourth post in a series on disk performance
Help — My disk performance       (RAID how/why what to put where)
Help — My disk performance II  (Physical connections to the SAN)
Help– My disk performance III  (Partitioning and compression)

Now that we have partitioning covered, filegroups should actually be relatively simple because it just becomes a physical discussion about the logical splits we just arrived at.  Just to restate the obvious, PLEASE come up with a naming convention for your filegroups, file-names, and directory structures. 10 minutes (literally) of thought about this up front will make monitoring, scripting, moving, etc MUCH easier later.

Data Files
The Windows OS operates on file-handles, and depending on the number of processors and read/write to more than 1 file, you will get many opinions on this one, and there are some “it depends” sort of answers…. what I have found through testing with 8 processors and the disk configuration discussed to this point in this blog, 1 file per processor for data files +1 file. (Log files in a moment)  More than 8 files in the filegroup seemed to add little measurable value, the 9th file is my insurance policy for unexpected growth.  Understanding how Sql Server writes to the files is also important, the engine will do it’s best to keep the amount of data equal in all the files in a filegroup.  So having a file-group with 7 files at 75% full, if you add a 8th file all the new data will go to the 8th file till it is 75% full, effectively removing the benefit of multiple files and creating a hotspot on that file.  (I will have more specifics, DMV’s, etc, for monitoring in a later post).Some basic rules:

  1. Pre-allocate space to all 8 datafiles
  2. Do not allow autogrowth on the 8 main files in the filegroup.
  3. Set the initial size small on the 9th file, and allow autogrowth to some number your’re comfortable with (I allow the ‘extra’ file to have growth to the same size as the primary files) I don’t like the autogrowth, but I like things failing with space issues less.
  4. Monitor the extra file for growth…if it is growing, you messed up your space projections, and you need to add space to the other 8 files.
  5. It is best to get the calculations correct, and preallocate the 8 files, this will prevent physical disk fragmentation as the files are added to.
  6. Monitor disk fragmentation on the server.

 Primary – personally, nothing against primary, but I put nothing into primary, and monitor what is in that space.  IF something went to primary, someone (probably me) built something without thinking all the way through.  I would allow for 8 data files preallocated to a reasonable size relative to your other dataspaces.  IF something goes into primary, you don’t want things failing or performing poorly, you can move things around in the background if something that goes there needs to be kept.

Expanding upon a diagram we looked at earlier, you can see the number of filegroups, and their relationship to the physical disks

Beyond the annual partitions, you see a filegroup for “Dim Data” (dimension data), “Dim Idx” (dimension indexs) “Data 1/2/3” (generic data spaces)  You are going to have some data that does not belong to dimensions, and is not partitioned.  Hopefully all of the large stuff is partitioned, but, you need to have a non-partitioned home for some large tables as well.  If you KNOW two of these large tables are always used together, put them in data spaces that are on different drives. “Index” (non-partitioned indexs).  Depending on the quantity of non-partitioned tables, you may want to consider an 2″ space to allow for separation of the index IO from the table IO. 

Log files
Basically the same principles apply to log files (remember, they are going to their own drive).  Through my testing, due to the high level of activity,  logging can benefit from additional files (3 per cpu).  To avoid the over head associated with growing log files set them to their maximum size and do not allow for autogrowth.  Because there is no autogrowth on these files, you need to ensure you have sufficient log space available to manage data loading and queries, as well as monitoring in place to detect logs filling.  In addition to these file management recomendations, logging for a Data Warehouse should be set to simple mode. (Backup and recovery design for a Data Warehouse is another topic).

Naming Conventions
Each data drive should have a sqldata directory followed by a directory for each database (please don’t be one of those people that drop a few hundred files from all the databases on a server in one directory).  Beyond that, each filegroup should get a name.

J:\                                                                      <—- Data Drive
      SQLDATA                                                  <—- SQL Data directory (this will matter when you have ssas on the same drive as data) or sqllog
            Example_DB                                      <—- Each database has a directory
                  Example_db_data_2005_1  <—- Name each file – dbname + file type (data/log) + year + number

 As you can well imagine, there are going to be quite a number of files, so maintaining order within the naming will give you the ability to script, and otherwise programatically monitor things.  This will also provide a very clear delineation of which data resides in which file.  When it comes time to move the oldest data to a different drive, it is as simple as issuing the command in the database to provide the new location of the file, shut-down the database, physically copy the file to the new location, restart the database.  This will become important once your primary “fast” drives start to fill, and the old (least used) data can be migrated to larger (slower), less expensive LUNS.

Tags: , , ,

7 Comments on “Help — My disk performance IV”

  1. sqlserverio July 23, 2010 at 10:13 am #

    Your log files statement is not correct. Logs are written to sequentially and one at a time. They don’t operate like data files and do not write in a round-robin equal fill method. There isn’t a performance gain from multiple log files. Paul Randal has several great write up’s on how logging in SQL Server works. also
    How you saw a perf bump from 3 log files per CPU I don’t know. Did you try one log file sized like the three log files per cpu method?

    • stefbauer July 23, 2010 at 10:32 am #

      THANKS! for the feedback, I will correct/clarify my log-fill explaination.

      The number of log-files per cpu I did arrive at by testing.

  2. sqlserverio July 23, 2010 at 10:35 am #

    Is a test reproducible by a third party? Is it something I could run and test? If so I would watch what is actually happening at the file level and see what may be speeding things up with the multiple log files.


  1. Help – My disk performance V « StefBauer's Blog - March 8, 2010

    […] Help — My disk performance IV […]

  2. Help – My disk performance VI « StefBauer's Blog - March 11, 2010

    […] Help — My disk performance IV […]

  3. Help – My disk performance III « StefBauer's Blog - March 12, 2010

    […] Help — My disk performance IV […]

  4. SSIS Design Tip – Build For Restartability « StefBauer's Blog – @StefBauer - April 5, 2010

    […] is to develop a common, unified set of standards.  In prior posts I have stressed the importance of naming conventions for database files, as well as connections in the packages.  This is less about the naming of the […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: