RAID 5
From DpWiki
Contents |
Introduction
Similar to RAID 4, these arrays combine data striping with parity information in order to increase performance while maintaining redundancy. In these arrays, blocks of data (commonly called 'stripes') are alternated between all of the drives in the array. Since the drives can opperate in parallel, the transfer rates of sequential tasks are significantly improved over single drive systems. To provide a basic level of redundancy, RAID 5 arrays mix parity information into the data stream so that the system to survive the failure of any single drive in the array. Note that the loss of two or more drives will result in a total loss of all data on the array, so it is critical for users to replace failed drives as quickly as possible.
The primary difference between RAID 4 and RAID 5 is that these arrays distribute their parity information across all drives in the array rather than using a dedicated drive. In a RAID 4 array, the performance impact of a drive failure is dependant on which one of the drives fails. If the parity drive is damaged, performance is generally uneffected - however the loss of any one of the data drives can cause a significant reduction in performance as the missing data has to be calculated from the parity information. As RAID 5 arrays distribute parity information across all disks, the performance impact is more predictable as some of the blocks will have all of the data available to them regardless of which drive fails.
This provides many of the performance benefits that RAID 0 brings to the table, but adds a basic layer of redundancy in order to decrease the risk of data loss. This allows these arrays to be used in applications that require data to be protected, but also need performance levels beyond what any individual drive can provide. As with RAID 0, the improvement in performance scales with the number of drives that are contained in the array.
Method of Operation
When an array is opperating in RAID 5 mode, data being stored on the drive is seperated into discrete blocks. These blocks as well as calculated parity information are then distributed across all of the drives in the array. Unlike RAID 4 arrays, RAID 5 controllers will distribute the parity data on all drives in the array rather than using a dedicated drive. See the figure to the right for an illustration of how this mechanism works.
As each drive in the array can operate in parallel, this means that more than one block can be stored/retreived at any given time. In the three drive example given above, for instance, disk A can retreive block 1 while disc B is retreiving block 2. On a conventional hard drive, the system would need to wait for block 1 to be retreived before it could begin working on the second block.
As such, when handling a sequential data transfer a three disc RAID 5 array can theoretically double the throughput. A four disc array can theoretically tripple the throughput and so on. This is true for both read and write operations, as the interleaving process benefits both tasks equally. Unfortunately, the process of calculating the parity information adds significant overhead to the process and generally means that these arrays will rarely be able to achieve this level of performance.
In the case of a drive failure, the data on the missing disc can be reconstructed using the information on the remaining drives. When this happens, the performance of the array will be reduced as the controller will have to do additional work to fill in the missing blocks of data. Unlike arrays with dedicated parity drives (such as RAID 4), the performance of a damaged array will be the same regardless of which disc has failed. Once the failed drive is replaced, however, the controller will reconstruct the missing data on the new disk and allow the system to return to full speed.
Reliability
RAID 5 can survive the failure of a single disk and continue to opperate, however the loss of two drives will still result in loss of all data stored on the array. It is important to remember that once a drive fails the array is effectively operating as a RAID 0 array so it is critical that a replacement be installed as soon as possible. As long as this is done, these arrays can be extremely reliable and are a good way to store critical data that requires high-performance.
In many cases RAID 5 controllers provide an option to install an additional drive as a hot spare. When installed, this drive will remain idle until a drive failure occurs - when that happens, the controller will automatically use this drive to replace the one it has lost. This significantly reduces the risk of data loss after a drive failure as it is not reliant on the user to replace the failed part on their own. The use of hot spares is especially important when used in systems that do not have hot swap capabilities as it is the only way to repair the array without shutting down the system.
Overhead
The efficiency of RAID 5 arrays is dependant on the number of drives used to create the array. Regardless of how many disks the array contains, a the equivalent capacity of a single drive will still be used for parity information so the larger the array the more efficient it will be. As such, three 160GB drives in a RAID 5 array will provide 320GB of usable capacity (66% of the purchased capacity), four 160GB drives would provide 480GB of usable capacity (75% of purchased), etc.
Requirements
A RAID 5 array requires a minimum of three hard drives of equal size as well as a hard drive controller than supports it. To maximize performance, it is generally recommended that all of the drives used in the array are of the same make and model.
See Also
- RAID - General overview of RAID and all of its different levels.
- RAID 0 - Stripes the data at the block level to maximize performance, however that increases the risk of lost data versus individual disks.
- RAID 1 - Mirrors all data onto all of the drives contained in the array. This provides the highest level of protection, however it is also relatively inefficient.
- RAID 2 - Similar to RAID 0 but stripes at the bit level rather than the block level. This level is very uncommon and not supported by any modern RAID controllers.
- RAID 3 - Uses byte-level striping with a dedicated parity disc. This can provide similar performance to RAID 0 but can survive the failure of one drive.
- RAID 4 - Same as RAID 3, but uses block-level striping.
- RAID 6 - Uses block-level striping with two distributed parity discs. This can provide similar performance to RAID 0 but can survive the failure of up to two drives.
- RAID 10 - A combonation of RAID 0 and RAID 1. Data is striped across two RAID 1 arrays. This provides the performance advantages of RAID 0 and can survive at least one drive failure.
- RAID 0+1 - A combonation of RAID 0 and RAID 1. In this mode, two RAID 0 arrays are mirrored. This provides the performance advantages of RAID 0 and can survive at least one drive failure.
- Matrix RAID - A propreitary technology used by some Intel chipsets that allows drives to be partitioned and different RAID levels used on each block.

