As I previously posted Amazon have announced that persistent storage is on it’s way for EC2. Sadly the public launch date has not yet been disclosed - it’s “coming later in the year”. In the meantime this leaves the question of what to do when you need data to persist?
There are a number of options, especially when you start to consider scalability and fault tolerance. I won’t dare claim I’ve considered all the options out there - I’ve simply started to look at what the immediate options are for persistent storage.
Without further ado, then, on to the technology. I’ve found a number of choices - rather than this being a ‘how to’ then it is more about the solutions I have found so far and that are on my list for consideration. Hopefully, if one of the options fits my needs then I will provide a guide at a later date!
At a high level the solutions can be split into two categories - file system interfaces to S3 or architecture based replication and backup. There are likely to be other options that allow you to use persistent storage outside the Amazon cloud - for the time being I’m trying to stay within the bubble. There is also a cross-over between the two strategies - i.e. using a particular architecture with a file system interface.
The file system interfaces to S3 are typically based on either the FUSE library or implemented as an NBD. FUSE allows developers to create interfaces (in user space) that ultimately surface to the user as a standard file system, while NBD allows interfaces that appear as a raw, unformatted physical disk (allowing the user to format the disk as they please).
s3fs - Open Source
s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files). Maximum file size is 5G.
PersistentFS - Closed Source, Current Development Version is Free
PersistentFS is a fast and efficient POSIX-compliant file system that provides unlimited online storage in the Amazon Web Services (AWS) storage cloud.
InfiniteFTP provides an FTP interface to Amazon S3 - while this isn’t a file system approach then you could add a FUSE layer over the top to make it appear as one. CurlFtpFS is a filesystem for accessing FTP hosts based on FUSE and libcurl.
ElasticDrive - Closed Source, Paid Service
ElasticDrive is a Distributed Remote Storage Application that allows you to access your data regardless of location with the assurance your data is safe. It makes it possible for a remote storage resource, such as Amazon’s Simple Storage Service (S3), Nirvanix & Xdrive to behave like a local hard drive.
There are a number of options for architecture based persistence that immediately spring to mind - i.e. various replication strategies such as DRDB or distributed filesystems perhaps backed by backups to an external service - the lag of such an approach would need to be considered. There are also options such as mirrored RAID (potentially for read speed and consistency) although RAID isn’t an option on FUSE based devices.
I’ve only considered the options I could find for storage backed by S3 - my main in head reason for this was to avoid transfer charges. There will still be charges for the GET, PUT and LIST requests so I also need to consider the cost implications of using S3 to assist with persistent storage.
The next step is to take a deeper look at the options on offer and try to discover if there is a workable solution for persistent storage already available. I also want to run benchmarks against some sample configurations to see if the options perform under real world requirements. Finally, I will run some numbers to check if the costs stack up. By the time I’ve finished, the official Amazon offering may well be available - if not I shall document my findings!