Article Summary


Summary: Layered architectures are used in large scale distributed storage systems in order to overcome the complexity of manipulating these storage systems. In the research conducted in this paper, the researchers have incorporated a layered architecture i.e. a multilayer design for Facebook messages stack. The researchers have tried to manipulate HDFS to improve the storage system. HDFS can store large data by default but it has been observed that 90% of data files are smaller than 15MB. HBase design is also simple but its performance is lower than HDFS.

Strengths: The researchers have performed multiple simulations on different cashing, logging and tried different architectural changes to reach the results.  The combined logging approach suggested in this research has the capacity to make use of HBase with less performance compromise than it usually does. The research also provides a basis to merge HBase logs to a specific disk that results in reduce logging latencies.

Weaknesses: The study has derived its results from a representative sample instead of running the experiment on all the available machine due to the amount and resources needed. The researchers used tracking codes that had a limited access to user data of Facebook i.e. only size of data information and not the data content itself. This could have affected the outcomes of the research.

Questions: How can the additional I/O workload produced by simple layering techniques be overcome?

What would be the response of the proposed system if not only size information of data but also the content of the data is used?