2002-2003

Project Members: Sujay Godbole, Rohit Kulkarni, Sourabh Patwardhan, Ashish Vaidya

Abstract

The project aims at providing a distributed file serving environment for clusters by utilizing the storage space on client machines, while maintaining the NFS semantics. The main components of our system are a Metadata Server (MDS) and Data Storage Entities (DSEs). The MDS, a Linux kernel space module, is responsible for distribution of data on different nodes and maintains the corresponding metadata. Each node (DSE) storing file data has a standard NFS server running which is responsible for storage and retrieval of file data from the DSE. Cross-platform DSEs are supported due to the use of standard NFS servers. Every client of the MDS is a potential DSE. A number of DSEs can be integrated into the system, thus enabling easy scaling of storage space. The data in the storage space is striped across available DSEs to avoid hotspots and internal fragmentation of the shared storage. The striping strategy has been designed to have a stripe size that minimizes metadata overheads and internal fragmentation. The striping of data would also be beneficial in improving the performance of reads. DSE failure is accommodated by replicating the data stripes, thus increasing availability of data.
A client sends a standard NFS request to the MDS. Only for Read and Write requests, the MDS locates the DSE(s) on which the requested data is stored and contacts the corresponding NFS server(s) on the DSE(s). The data is then sent to the requesting client via the MDS for satisfying Reads. For Writes, the MDS stripes the data across available DSEs. Create and Remove are logical operations on the MDS; no data transfers are involved, thereby enhancing the performance and responsiveness. The MDS satisfies all other requests on its own, without the involvement of any DSE. The distributed nature of storage remains transparent to clients.