Summary: Recent storage systems consists of multiple disks which are designed to optimally deal with individual disk failures. But it has been observed that these storage system have serious challenges when multiple disk failures occur. Present study is aimed at designing a RAIDSHIELD which evaluated a defense mechanism to check the status of disks in a storage system and replace the disks that have are predicted to fail. This study also developed and simulated process that can predict in advance about the failing RAID group for an instantaneous disk failure.
Strengths: There has been no specific definition or set of definitions to identify a disk failure as a disk failure is not a simple mechanism. The current study has tried to fill this gap and identified a criterion to define a disk failure. The current study not only sheds light on complete disk failures but also partial disk failure. The RAIDSHIELD system continuously monitors the disks that have periodic errors and replaces them when they reach a certain threshold. This function allows to protect a group of disks from failing. A comprehensive disk analysis is conducted with I million disks studied.
Weaknesses: The study is conducted on specific backup systems. The results may or may not be applicable generally to all storage systems. The developed system is dependent on the disk capacity to report errors before failing. If errors are not reported prior to them happening, it might negatively impact the performance of the proposed system.
Questions: How is the proposed system going to respond if the disk faults are tolerated up to the application level?
What is the difference between the system performance for sector errors and complete disk failures?