First of all, let me start by saying this topic (for those who care) seems to generate quite a bit of conversation. This is the first installment of what will be many on this topic. Just in writing this one, you will see my struggle to stick to the topic. The context of this ramble is based on a few assumptions:
- We are talking about a Data Warehouse
- You are running SQL Server (hopefully 2008)
- You are connected to a SAN (Storage Area Network)
OK, with that out of the way, we are going to start at the bottom of the stack with some basics.
- RAID-0 Please don’t use this…. stripe set built across multiple disks, performs well, but if you lose 1 disk, you have lost the whole stripe set.
- RAID-5 Data is striped across 3 or more disks. This configuration tolerates 1 disk failure in the set (performance will be degraded until the disk is replaced). This is an effective layout for database log-files. First, I would recommend running a warehouse in Simple-mode so the log storage is only important during the length of a transaction. Secondly, performance is good for sequential write activity (ideal for simple-mode logging). Third save your money (and SAN slots) and get more pysical storage from a raid-5 group due to the lack of mirroring.
- RAID-10 (1+0) has 4 or more drives. Mirrored data (raid-1) between two sets of drives which are then striped across multiple disks (raid-0). This will give you the performance gains of raid-0, the data security of allowing any drive in the set to fail, as well as leveraging some serious SAN technology with read-ahead, buffering, etc, as the SAN has 2 copies of the data to work with.
Basics of RAID covered, so how many drives (LUNS) should I present to the server? (Number of data files and many other things will be in a later post)
- Log-drive (RAID-5) 150Gb or larger. There should be NOTHING on these drives in the SAN other than log data. If your sharing the SAN with other applications, DO NOT put other data on these drives. Other Logs from SQL server databases…. sure…. other application data…. don’t you dare. The SAN will leverage quite a bit of technology to optimize the read-write activities for a given set of drives in the array, log activity is all similar, and thus buffer allocation to those drives will be optimized for that activity. If you mix the activity you will blow the ability of the SAN to help optimize the kind of activity you’re sending.
- TempDB (RAID-10) 150GB or larger. This should be a separate drive from the other drives you present to the server. If you can… put this on its own set of drives that shares other tempdb activity from other servers. This drive is also the drive for the other system databases.
- SSAS (RAID-10) 400Gb or larger. I also have a user work database that I allocate on this drive. Basically allocate all of the user-related “uncontrollable” …. ok uncontrollable is not the right word, but the fact that users can use data-mining structures in excel that create back-end cubes, or users that can dump rows into a work table is a little uncontrollable from an admin perspective. (more on the controls later)…. but you get the point…. if you put all of this stuff on its own logical drive, you will limit both the performance impact as well as space growth of the adhoc activity.
- Data Drive (x) (RAID-10) 1Tb each(minimum 2 drives), across 40 physical drives. These should be 146Gb (or smaller) … FAST drives. Given that we are in a raid 10 configuration, 146Gb drives format to 133Gb, gives you (5320/2)=2660GB of “allocateable” space. Thus, in those 40 drives you can allocate 2 1tb luns , as well as a 600Gb lun for SSAS/User space. The reason for a 1tb cap on the drives and how to lay datafiles onto these 2 (or more) drives are all later topics. (This ramble will go on for a while — stay tuned).
- Historical data drive (RAID-10) - 1Tb drive of larger – slower drives. Feel free to deploy 300Gb drives, allocated across fewer (20 maybe even 10) physical drives. I am trying desperately to stay away from “it depends” in all of this, and give as many actual numbers that I can… but this depends on how often the old data that you put on this drive is accessed. Utilizing compression (more on this later…. really, hopefully this post will turn into something useful someday) you can get very good read-rates. 300Gb drives in smaller number will NEVER perform like the smaller counter-parts, so long as your users can tolerate historical reporting, and analysis to perform slower… this should be fine. In addition, the day-to-day analysis of this older data will likely reside in a SSAS partition that will not normally be refreshed.