--------------------------------------------------------------- PSS ID Number: Q100110 Article last modified on 06-03-1994 3.10 WINDOWS --------------------------------------------------------------- The following information applies to: - Microsoft Windows NT Advanced Server, version 3.1 --------------------------------------------------------------- SUMMARY ======= This article explains the differences between redundant arrays of inexpensive disks (RAID) versions 0 through 5, and what Microsoft Windows NT Advanced Server supports. This article also explains some of the advantages and disadvantages of the various RAID configurations. MORE INFORMATION ================ Microsoft Windows NT Advanced Server supports only RAID 0, RAID 1, and RAID 5. Fault tolerance and disk array implementations, while generally based on the design described here, vary considerably among manufacturers. RAID 0 ------ RAID 0 includes a disk array that implements striping without any drive redundancy. It offers no fault tolerance and is less reliable than a single-drive implementation; its only advantage is speed. RAID 0 is suitable for certain special applications, as in scientific analysis or imaging, where compromised system reliability can be tolerated. RAID 1 ------ RAID 1 is disk mirroring. Two drives store identical information so that one is a mirror of the other. For every disk operation, the system must write the same information to both disks. Because dual write operations can degrade system performance, many employ duplexing, where each mirror drive has it own host adapter. While the mirror approach provides good fault tolerance, it is relatively expensive to implement, because only half of the available disk space can be used for storage while the other is used for mirroring. Novell NetWare, in particular, incorporates support for disk mirroring. RAID 2 ------ RAID 2 uses extra check disks, with data bits striped across the data and check disks. The data includes an interleaved Hamming code, which can be used to detect and correct single bit errors as well as detect double bit errors. Because of the amount of information required for the check bits, several check disks are required to implement RAID 2. It is optimal for reading and writing large data blocks at high data transfer rates, but smaller block reads are inefficient. Read, modify, and write operations required for small block write operation also result in poor performance. RAID 2 is generally impractical for smaller systems and is not available with Microsoft Windows NT Advanced Server. RAID 3 ------ RAID 3 uses a single redundant check disk (sometimes referred to as a parity disk) for each group of drives. Data written to the RAID 3 disk array is bit striped across the data disks. The check disk receives the XOR (exclusive OR) of all the data values written to the data drives. Because data transfers to and from individual drives occur only in unit sector multiples, the minimum amount of data that can be written to or read from RAID 3 disk array is the number of data drives multiplied by the number of bytes per sector (this is known as a transfer unit). This option is not available with Microsoft Windows NT Advanced Server. RAID 4 ------ RAID 4 offers a disk array architecture that is better optimized for transaction processing applications than RAID 3. RAID 4 performs block striping or sector striping on the data on the drives, while RAID 3 performs bit striping. Thus, with RAID 4, one entire sector is written to one drive, the next sector is written to the next drive, and so on. This technique allows multiple unrelated sectors to be read simultaneously, and it is particularly valuable for small reads that need to access only a single drive in the array. RAID 4 dedicates one entire disk for storing check data, allowing data from a failed drive to be easily recovered. While this approach allows multiple reads to occur simultaneously, with different sectors from different drives, write operations are bottlenecked. Because the single check disk operation must be written to during every write operation, only one write operation can take place at a time. This option is not available with Microsoft Windows NT Advanced Server. RAID 5 ------ Unlike RAID 4, which dedicates a single physical disk for check data, RAID 5 dedicates the equivalent of one entire disk for storing check data but distributes the check data over all the drives in the group. For example, sector 1 of disk 5 may be assigned to hold the check data for sector 1 of the remaining data drives and so on. Because the check data is simply the XOR of all the write data values for the corresponding sector on each of the data disks, as long as the old sector data and the old check data values are known, the new check data for a single sector write can be calculated without having to read the corresponding sectors from the other data disks. Thus, only two disks are involved in a single sector write operation: the target data disk and the corresponding disk that holds the check data for that sector. This is in contrast to the RAID 3 implementation, which requires all drives in a group to be read and written when a single sector size write occurs. The primary benefit of the RAID 5 distributed check data approach is that it permits write operations to take place simultaneously. It also allows multiple reads to take place simultaneously and is efficient in handling small amounts of information. This is the preferred option when setting up fault tolerance in Microsoft Windows NT Advanced Server. How RAID 3, RAID 4, and RAID 5 Recover and Rebuild -------------------------------------------------- RAID 3, RAID 4, and RAID 5 disk array designs allow for data recovery. When data is written to multiple data disks, the XOR or all the data values is written to the check disk. If any one disk fails, the missing data from that disk can be determined (recovered) by taking the XOR of the data values from the remaining data drives and the check disk. This operation can be implemented in either the system software or the host adapter. Additional reference words: 3.10 hrdwr filsys KBCategory: KBSubCategory: fautol ============================================================================= Copyright Microsoft Corporation 1994. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Sparing Bad Disk Sectors The file system verifies all sectors when it formats a volume. All faulty sectors are spared from service. Additional Windows NT fault-tolerance services add sector-recovery capabilities to the system. When there is a sector I/O failure in a fault-tolerant system with redundant copies of the data, the Windows NT Fault Tolerance driver attempts to spare the bad sector from use. This includes performing a device control asking the disk device driver to spare the sector from use. Small Computer System Interface (SCSI) devices can do this, but AT devices [that is, Integrated Device Electronics (IDE) and Enhanced Small Device Interface (ESDI)] cannot. When the sector cannot be spared, the correct information obtained from the redundant copy is returned to the file system with a status message stating that there is a faulty sector in the I/O. The Windows NT File System (NTFS) reacts to this message by attempting to locate the failure and sparing the bad sectors by removing their usage from the sector map of the file system. The Administrator is also notified in the Event Viewer program of the potential for data loss if the partition containing the redundant copy also fails. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Looking at Redundant Arrays of Inexpensive Disks What we touch on here is how Performance Monitor sees various redundant arrays of inexpensive disks (RAID) and fault-tolerant disk configurations. WeFll also mention some issues relating to their relative performance as observed on one computer, but it would be an error to extrapolate these results to some other system. You need to perform these experiments on your own configurations and under your own real or anticipated workloads to make judgments about optimal disk configuration. In our example, we have (as physical unit 0) a hardware RAID array of 4 spindles and 800 MB capacity. We partitioned this into drive C (300 MB), and drives F and G, which are 250 MB each. We also have three other disk units with about 340 MB capacity each to play with. We created a 200 MB file on a single partitioned drive D, and another one on a mirrored partition on the other two disks, drive E. After we finished the experiments on drives D and E, we rearranged those three spindles as a single striped partition for drive D (no parity) and created a 200 MB file on that. We had two disk controllers, one for the hardware RAID array, and another for all three of the other disk units. All the 200-MB file creation times were 420 seconds, except for the striped partition on drive D which created itself in 314 seconds. In the next two figures we show the difference in behavior of drive D as a single drive and as a striped volume. Figure 4.23 File creation on a single spindle Figure 4.24 File creation on a three-spindle striped volume without parity Notice the Avg. Disk sec/Write is 0.081 for the single unit and 0.061 for the striped set. This results in higher Disk Bytes/sec. Striping reduces seeking and therefore improves performance. In Figure 4.24, drive D is striped across units 1, 2, and 3. LetFs look at the performance of the Physical Disks. Figure 4.25 Physical disk statistics for a striped volume Whoops. What happened to units 2 and 3? Well, DISKPERF.SYS cannot see which physical volume the write operation executes on. This is because DISKPERF.SYS is located above the fault-tolerant disk driver FTDISK.SYS in the driver stack, as shown in Figure 4.1. The decision as to which spindle will get the data is made by FTDISK.SYS and therefore is invisible to DISKPERF.SYS. The only way to get visibility would be to add another measurement driver below FTDISK.SYS on the stack. But this would increase the overhead, and we elected not to do it. The additional information is not important enough to warrant the overhead. Mirrors, stripes, and hardware RAID devices all share this Performance Monitor characteristic: Performance Monitor summarizes all Physical Disk statistics under the first unit assigned to the disk array. The next experiment was to read 100 unbuffered (with no file system cache), normally distributed records of 8192 bytes from the file on each drive type. Figure 4.26 Reading from three-spindle striped volume Figure 4.27 Reading from four-spindle hardware RAID The hardware RAID is slower at this, for some reason. Perhaps its physical drives are slower. The next two figures show our old test of rereading records of various sizes to determine maximum disk throughput. The first figure shows the striped volume, the next one shows the hardware RAID. Figure 4.28 Disk throughput test for a three-spindle striped volume Figure 4.29 Disk throughput test for a four-spindle hardware RAID Well, now isnFt that interesting! The RAID device is quite impressive at higher transfer sizes, and increases monotonically in performance as the transfer size does. The striped volume is not so pretty. It has spots where the performance degrades due to missed revolutions. (Because we are rereading the same record over and over, only one spindle participates in this test.) But there is a serendipitous node at the 8192 transfer size, which just happened to be the size in our test case. Which of these two technologies would you rather spend your money on? You need to understand the transfer size characteristics of your traffic to be sure. For 4096- or 8192-byte transfers, the striped volume wins; for transfers larger than 5 pages, the hardware RAID is the clear winner. Now donFt get us in trouble by trying to use these results directly in your shop. There are a lot of variables. With a different controller, drive, or processor you get different results. Another way to alter the outcome is to try writing instead of reading. When we substitute writing for reading in the above test, we get 0.016 seconds per record for the striped volume, 0.028 for the single spindle, 0.030 for the hardware RAID, and 0.041 for the mirror. Writing is slower on the mirror because both spindles must be written. If we had another controller for one half of the mirrored pair we would have possibly seen an improvement, not to mention better fault tolerance. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- PSS ID Number: Q113933 Article last modified on 04-25-1994 3.10 WINDOWS ---------------------------------------------------------------------- The information in this article applies to: - Microsoft Windows NT operating system version 3.1 - Microsoft Windows NT Advanced Server version 3.1 ---------------------------------------------------------------------- SUMMARY ======= Windows NT and Windows NT Advanced Server disk striping (RAID Level 0) creates a disk file system called a stripe set by dividing data into blocks and spreading them in a fixed order across all disks in an array. By adding data to all partitions in the set at the same rate, disk striping offers the best performance of all Windows NT disk management strategies. Windows NT Advanced Server allows you to establish fault tolerant disk striping with parity (RAID Level 5), which stores parity information along with striped data on different disks in the array for redundancy. Disk striping with parity is available only with Windows NT Advanced Server, not with Windows NT. The rest of this article describes disk striping with and without parity in Windows NT and Windows NT Advanced Server. MORE INFORMATION ================ Disk Striping in General--With or Without Parity ------------------------------------------------ - Stripe sets are user-transparent: when they are created, all partitions are assigned the same drive letter. - All partitions in a stripe set are the same size. If you select free disk areas of different sizes when you create a stripe set, no stripe will be larger than the smallest free disk area. - Stripe sets must be created from free disk space; they cannot be used on existing partitions. - Stripe sets are file system independent and can be formatted with any Windows NT disk file system. - Disk Administrator assigns the same color to all stripe sets. The status bar in the lower left corner of the Disk Administrator window tells whether a stripe set has parity or not. - Only the Windows NT Advanced Server installation that created the stripe set will normally recognize it; other operating systems will not. MS-DOS identifies stripe set partitions as Non-DOS. Other installations of Windows NT and Windows NT Advanced Server identify stripe set partitions as being of "Unknown" file system type. - An installation of Windows NT or Windows NT Advanced Server can restore disk configuration information and thereby recognize a stripe set created by a different installation on the same machine. See page 529 of the "Windows NT Advanced Server System Guide" for more information. Disk Striping Without Parity ---------------------------- - Disk striping without parity provides no fault tolerance; if one disk in the stripe is bad or damaged, the entire disk stripe is lost. - A stripe set can be created on as few as 2 and as many as 32 disks. Only one stripe on a stripe set can be located on each physical disk. - Disk striping offers the best performance of all Windows NT disk management strategies. - Disk Administrator assigns the same color to all stripe sets. For a stripe set without parity, the status bar in the lower left corner of the Disk Administrator window says simply "Stripe set #X." parity). - For information on creating and managing a stripe set, consult the "Windows NT System Guide" or "Windows NT Advanced Server System Guide." Disk Striping With Parity ------------------------- - A stripe set with parity can be created on as few as 3 and as many as 32 disks. Only one stripe on a stripe set with parity can be located on each physical disk. - The amount of disk space used to store parity information is always equal to the size of one of the partitions in the set. For example, if a stripe set with parity is created on five disks, each with a 500 MB partition used for the stripe, 500 MB is used for parity information and 2000 MB is available for data storage. - Regardless of how many disks are used in a stripe set with parity, data is recoverable only if no more than one disk is lost. If two or more disks are lost, the data is unrecoverable. - The fault tolerance driver (FTDISK.SYS) makes the loss of one partition in a stripe set with parity invisible--you can read and write to a set with a lost partition as if it were healthy. But the stripe set is no longer fault tolerant: the loss of any remaining partitions will result in an unrecoverable loss of all data in the stripe set. - The status bar in Disk Administrator indicates stripe set condition. When a partition in the set is selected, Disk Administrator displays information about the set in the lower left corner of the window, as in: "Stripe set with parity #0 [HEALTHY]" Other status indicators include: [NEW]: this appears right after the stripe set has been created in Disk Administrator, and before the shutdown of the system and the actual generation of the set. [INITIALIZING]: this appears during stripe set generation. [RECOVERABLE]: this appears when one of the partitions in the set has been lost but the other partitions are undamaged, or when one partition in the set is not synchronized with the others. - Disk Administrator assigns the same color to all stripe sets. To tell which have parity, look at the status bar in the lower left corner of the Disk Administrator window. For a stripe set with parity, the description says "Stripe set with parity #X." - For information on creating and managing a stripe set with parity, consult the "Windows NT Advanced Server System Guide" and the "Windows NT Concepts and Planning Guide." Additional reference words: 3.10 KBCategory: KBSubCategory: fautol ============================================================================= Copyright Microsoft Corporation 1994. ----------------------------------------------------------------------------- -----------------------------------------------------------------------------