---------------------------------------------------------------

PSS ID Number: Q100110
Article last modified on 06-03-1994

3.10

WINDOWS


---------------------------------------------------------------
The following information applies to:

 - Microsoft Windows NT Advanced Server, version 3.1
---------------------------------------------------------------

SUMMARY
=======

This article explains the differences between redundant arrays of
inexpensive disks (RAID) versions 0 through 5, and what Microsoft
Windows NT Advanced Server supports. This article also explains some
of the advantages and disadvantages of the various RAID
configurations.

MORE INFORMATION
================

Microsoft Windows NT Advanced Server supports only RAID 0, RAID 1, and
RAID 5. Fault tolerance and disk array implementations, while
generally based on the design described here, vary considerably among
manufacturers.

RAID 0
------

RAID 0 includes a disk array that implements striping without any
drive redundancy. It offers no fault tolerance and is less reliable
than a single-drive implementation; its only advantage is speed. RAID
0 is suitable for certain special applications, as in scientific
analysis or imaging, where compromised system reliability can be
tolerated.

RAID 1
------

RAID 1 is disk mirroring. Two drives store identical information so
that one is a mirror of the other. For every disk operation, the
system must write the same information to both disks. Because dual
write operations can degrade system performance, many employ
duplexing, where each mirror drive has it own host adapter. While the
mirror approach provides good fault tolerance, it is relatively
expensive to implement, because only half of the available disk space
can be used for storage while the other is used for mirroring. Novell
NetWare, in particular, incorporates support for disk mirroring.

RAID 2
------

RAID 2 uses extra check disks, with data bits striped across the data
and check disks. The data includes an interleaved Hamming code, which
can be used to detect and correct single bit errors as well as detect
double bit errors. Because of the amount of information required for
the check bits, several check disks are required to implement RAID 2.
It is optimal for reading and writing large data blocks at high data
transfer rates, but smaller block reads are inefficient. Read, modify,
and write operations required for small block write operation also
result in poor performance. RAID 2 is generally impractical for
smaller systems and is not available with Microsoft Windows NT
Advanced Server.

RAID 3
------

RAID 3 uses a single redundant check disk (sometimes referred to as a
parity disk) for each group of drives. Data written to the RAID 3 disk
array is bit striped across the data disks. The check disk receives
the XOR (exclusive OR) of all the data values written to the data
drives. Because data transfers to and from individual drives occur
only in unit sector multiples, the minimum amount of data that can be
written to or read from RAID 3 disk array is the number of data drives
multiplied by the number of bytes per sector (this is known as a
transfer unit). This option is not available with Microsoft Windows NT
Advanced Server.

RAID 4
------

RAID 4 offers a disk array architecture that is better optimized for
transaction processing applications than RAID 3. RAID 4 performs block
striping or sector striping on the data on the drives, while RAID 3
performs bit striping. Thus, with RAID 4, one entire sector is written
to one drive, the next sector is written to the next drive, and so on.
This technique allows multiple unrelated sectors to be read
simultaneously, and it is particularly valuable for small reads that
need to access only a single drive in the array. RAID 4 dedicates one
entire disk for storing check data, allowing data from a failed drive
to be easily recovered. While this approach allows multiple reads to
occur simultaneously, with different sectors from different drives,
write operations are bottlenecked. Because the single check disk
operation must be written to during every write operation, only one
write operation can take place at a time. This option is not available
with Microsoft Windows NT Advanced Server.

RAID 5
------

Unlike RAID 4, which dedicates a single physical disk for check data,
RAID 5 dedicates the equivalent of one entire disk for storing check
data but distributes the check data over all the drives in the group.
For example, sector 1 of disk 5 may be assigned to hold the check data
for sector 1 of the remaining data drives and so on. Because the check
data is simply the XOR of all the write data values for the
corresponding sector on each of the data disks, as long as the old
sector data and the old check data values are known, the new check
data for a single sector write can be calculated without having to
read the corresponding sectors from the other data disks. Thus, only
two disks are involved in a single sector write operation: the target
data disk and the corresponding disk that holds the check data for
that sector. This is in contrast to the RAID 3 implementation, which
requires all drives in a group to be read and written when a single
sector size write occurs. The primary benefit of the RAID 5
distributed check data approach is that it permits write operations to
take place simultaneously. It also allows multiple reads to take place
simultaneously and is efficient in handling small amounts of
information. This is the preferred option when setting up fault
tolerance in Microsoft Windows NT Advanced Server.

How RAID 3, RAID 4, and RAID 5 Recover and Rebuild
--------------------------------------------------

RAID 3, RAID 4, and RAID 5 disk array designs allow for data recovery.
When data is written to multiple data disks, the XOR or all the data
values is written to the check disk. If any one disk fails, the
missing data from that disk can be determined (recovered) by taking
the XOR of the data values from the remaining data drives and the
check disk. This operation can be implemented in either the system
software or the host adapter.

Additional reference words: 3.10 hrdwr filsys
KBCategory:
KBSubCategory: fautol

=============================================================================

Copyright Microsoft Corporation 1994.

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------

Sparing Bad Disk Sectors

The file system verifies all sectors when it formats a volume. All faulty
sectors are spared from service. Additional Windows NT fault-tolerance 
services add sector-recovery capabilities to the system. When there is a
sector I/O failure in a fault-tolerant system with redundant copies of the
data, the Windows NT Fault Tolerance driver attempts to spare the bad sector
from use. This includes performing a device control asking the disk device
driver to spare the sector from use. Small Computer System Interface (SCSI)
devices can do this, but AT devices [that is, Integrated Device Electronics
(IDE) and Enhanced Small Device Interface (ESDI)] cannot. When the sector
cannot be spared, the correct information obtained from the redundant copy
is returned to the file system with a status message stating that there is
a faulty sector in the I/O. The Windows NT File System (NTFS) reacts to this
message by attempting to locate the failure and sparing the bad sectors by
removing their usage from the sector map of the file system. The
Administrator is also notified in the Event Viewer program of the potential
for data loss if the partition containing the redundant copy also fails.

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Looking at Redundant Arrays of Inexpensive Disks

What we touch on here is how Performance Monitor sees various redundant 
arrays of inexpensive disks (RAID) and fault-tolerant disk configurations. 
WeFll also mention some issues relating to their relative performance as 
observed on one computer, but it would be an error to extrapolate these 
results to some other system. You need to perform these experiments on your
own configurations and under your own real or anticipated workloads to make
judgments about optimal disk configuration. 

In our example, we have (as physical unit 0) a hardware RAID array of 4 
spindles and 800 MB capacity. We partitioned this into drive C (300 MB), 
and drives F and G, which are 250 MB each. We also have three other disk 
units with about 340 MB capacity each to play with. We created a 200 MB 
file on a single partitioned drive D, and another one on a mirrored 
partition on the other two disks, drive E. After we finished the experiments
on drives D and E, we rearranged those three spindles as a single striped
partition for drive D (no parity) and created a 200 MB file on that.

We had two disk controllers, one for the hardware RAID array, and another
for all three of the other disk units.

All the 200-MB file creation times were 420 seconds, except for the striped
partition on drive D which created itself in 314 seconds. In the next two
figures we show the difference in behavior of drive D as a single drive and
as a striped volume.

Figure 4.23  File creation on a single spindle

Figure 4.24  File creation on a three-spindle striped volume without parity

Notice the Avg. Disk sec/Write is 0.081 for the single unit and 0.061 for the
striped set. This results in higher Disk Bytes/sec. Striping reduces seeking
and therefore improves performance.
 
In Figure 4.24, drive D is striped across units 1, 2, and 3. LetFs look at
the performance of the Physical Disks.
 
Figure 4.25  Physical disk statistics for a striped volume

Whoops. What happened to units 2 and 3? Well, DISKPERF.SYS cannot see which
physical volume the write operation executes on. This is because
DISKPERF.SYS is located above the fault-tolerant disk driver FTDISK.SYS in
the driver stack, as shown in Figure 4.1. The decision as to which spindle
will get the data is made by FTDISK.SYS and therefore is invisible to
DISKPERF.SYS. The only way to get visibility would be to add another
measurement driver below FTDISK.SYS on the stack. But this would increase
the overhead, and we elected not to do it. The additional information is
not important enough to warrant the overhead.
 
Mirrors, stripes, and hardware RAID devices all share this Performance
Monitor characteristic:  Performance Monitor summarizes all Physical Disk
statistics under the first unit assigned to the disk array.

The next experiment was to read 100 unbuffered (with no file system cache),
normally distributed records of 8192 bytes from the file on each drive type. 

Figure 4.26  Reading from three-spindle striped volume

Figure 4.27  Reading from four-spindle hardware RAID

The hardware RAID is slower at this, for some reason. Perhaps its physical
drives are slower. The next two figures show our old test of rereading
records of various sizes to determine maximum disk throughput. The first
figure shows the striped volume, the next one shows the hardware RAID.

Figure 4.28  Disk throughput test for a three-spindle striped volume

Figure 4.29  Disk throughput test for a four-spindle hardware RAID

Well, now isnFt that interesting! The RAID device is quite impressive at
higher transfer sizes, and increases monotonically in performance as the
transfer size does. The striped volume is not so pretty. It has spots
where the performance degrades due to missed revolutions. (Because we
are rereading the same record over and over, only one spindle
participates in this test.) But there is a serendipitous node at the
8192 transfer size, which just happened to be the size in our test case.
 
Which of these two technologies would you rather spend your money on? You
need to understand the transfer size characteristics of your traffic to be
sure. For  4096- or 8192-byte transfers, the striped volume wins; for
transfers larger than  5 pages, the hardware RAID is the clear winner. Now
donFt get us in trouble by trying to use these results directly in your
shop. There are a lot of variables. With a different controller, drive, or
processor you get different results.

Another way to alter the outcome is to try writing instead of reading. When
we substitute writing for reading in the above test, we get 0.016 seconds
per record for the striped volume, 0.028 for the single spindle, 0.030 for
the hardware RAID, and 0.041 for the mirror. Writing is slower on the mirror
because both spindles must be written. If we had another controller for one
half of the mirrored pair we would have possibly seen an improvement, not to
mention better fault tolerance.

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------

PSS ID Number: Q113933
Article last modified on 04-25-1994

3.10

WINDOWS


----------------------------------------------------------------------
The information in this article applies to:

 - Microsoft Windows NT operating system version 3.1
 - Microsoft Windows NT Advanced Server version 3.1
----------------------------------------------------------------------

SUMMARY
=======

Windows NT and Windows NT Advanced Server disk striping (RAID Level 0)
creates a disk file system called a stripe set by dividing data into blocks
and spreading them in a fixed order across all disks in an array. By adding
data to all partitions in the set at the same rate, disk striping offers
the best performance of all Windows NT disk management strategies.

Windows NT Advanced Server allows you to establish fault tolerant disk
striping with parity (RAID Level 5), which stores parity information along
with striped data on different disks in the array for redundancy. Disk
striping with parity is available only with Windows NT Advanced Server,
not with Windows NT.

The rest of this article describes disk striping with and without parity
in Windows NT and Windows NT Advanced Server.

MORE INFORMATION
================

Disk Striping in General--With or Without Parity
------------------------------------------------

 - Stripe sets are user-transparent: when they are created, all partitions
   are assigned the same drive letter.

 - All partitions in a stripe set are the same size. If you select free
   disk areas of different sizes when you create a stripe set, no stripe
   will be larger than the smallest free disk area.

 - Stripe sets must be created from free disk space; they cannot be used
   on existing partitions.

 - Stripe sets are file system independent and can be formatted with any
   Windows NT disk file system.

 - Disk Administrator assigns the same color to all stripe sets. The status
   bar in the lower left corner of the Disk Administrator window tells
   whether a stripe set has parity or not.

 - Only the Windows NT Advanced Server installation that created the stripe
   set will normally recognize it; other operating systems will not. MS-DOS
   identifies stripe set partitions as Non-DOS. Other installations of
   Windows NT and Windows NT Advanced Server identify stripe set partitions
   as being of "Unknown" file system type.

 - An installation of Windows NT or Windows NT Advanced Server can restore
   disk configuration information and thereby recognize a stripe set
   created by a different installation on the same machine. See page 529 of
   the "Windows NT Advanced Server System Guide" for more information.

Disk Striping Without Parity
----------------------------

 - Disk striping without parity provides no fault tolerance; if one disk
   in the stripe is bad or damaged, the entire disk stripe is lost.

 - A stripe set can be created on as few as 2 and as many as 32 disks. Only
   one stripe on a stripe set can be located on each physical disk.

 - Disk striping offers the best performance of all Windows NT disk
   management strategies.

 - Disk Administrator assigns the same color to all stripe sets. For a
   stripe set without parity, the status bar in the lower left corner of
   the Disk Administrator window says simply "Stripe set #X."
   parity).

 - For information on creating and managing a stripe set, consult the
   "Windows NT System Guide" or "Windows NT Advanced Server System Guide."

Disk Striping With Parity
-------------------------

 - A stripe set with parity can be created on as few as 3 and as many as
   32 disks. Only one stripe on a stripe set with parity can be located on
   each physical disk.

 - The amount of disk space used to store parity information is always
   equal to the size of one of the partitions in the set. For example, if
   a stripe set with parity is created on five disks, each with a 500 MB
   partition used for the stripe, 500 MB is used for parity information
   and 2000 MB is available for data storage.

 - Regardless of how many disks are used in a stripe set with parity,
   data is recoverable only if no more than one disk is lost. If two or
   more disks are lost, the data is unrecoverable.

 - The fault tolerance driver (FTDISK.SYS) makes the loss of one partition
   in a stripe set with parity invisible--you can read and write to a set
   with a lost partition as if it were healthy. But the stripe set is no
   longer fault tolerant: the loss of any remaining partitions will result
   in an unrecoverable loss of all data in the stripe set.

 - The status bar in Disk Administrator indicates stripe set condition.
   When a partition in the set is selected, Disk Administrator displays
   information about the set in the lower left corner of the window, as in:
   "Stripe set with parity #0 [HEALTHY]" Other status indicators include:

      [NEW]: this appears right after the stripe set has been created in
             Disk Administrator, and before the shutdown of the system and
             the actual generation of the set.

      [INITIALIZING]: this appears during stripe set generation.

      [RECOVERABLE]: this appears when one of the partitions in the set
                     has been lost but the other partitions are undamaged,
                     or when one partition in the set is not synchronized
                     with the others.

 - Disk Administrator assigns the same color to all stripe sets. To tell
   which have parity, look at the status bar in the lower left corner of
   the Disk Administrator window. For a stripe set with parity, the
   description says "Stripe set with parity #X."

 - For information on creating and managing a stripe set with parity,
   consult the "Windows NT Advanced Server System Guide" and the "Windows
   NT Concepts and Planning Guide."

Additional reference words: 3.10

KBCategory:
KBSubCategory: fautol

=============================================================================

Copyright Microsoft Corporation 1994.

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------