We build Web & Mobile Applications.

< All Articles

EC2 Persistent Storage for the Impatient?

As I previously posted Amazon have announced that persistent storage is on it’s way for EC2. Sadly the public launch date has not yet been disclosed - it’s “coming later in the year”. In the meantime this leaves the question of what to do when you need data to persist?

There are a number of options, especially when you start to consider scalability and fault tolerance. I won’t dare claim I’ve considered all the options out there - I’ve simply started to look at what the immediate options are for persistent storage.

Without further ado, then, on to the technology. I’ve found a number of choices - rather than this being a ‘how to’ then it is more about the solutions I have found so far and that are on my list for consideration. Hopefully, if one of the options fits my needs then I will provide a guide at a later date!

At a high level the solutions can be split into two categories - file system interfaces to S3 or architecture based replication and backup. There are likely to be other options that allow you to use persistent storage outside the Amazon cloud - for the time being I’m trying to stay within the bubble. There is also a cross-over between the two strategies - i.e. using a particular architecture with a file system interface.

File System Interfaces to Amazon S3

The file system interfaces to S3 are typically based on either the FUSE library or implemented as an NBD. FUSE allows developers to create interfaces (in user space) that ultimately surface to the user as a standard file system, while NBD allows interfaces that appear as a raw, unformatted physical disk (allowing the user to format the disk as they please).

FUSE Interfaces

NBD Interfaces

Architecture Considerations

There are a number of options for architecture based persistence that immediately spring to mind - i.e. various replication strategies such as DRDB or distributed filesystems perhaps backed by backups to an external service - the lag of such an approach would need to be considered. There are also options such as mirrored RAID (potentially for read speed and consistency) although RAID isn’t an option on FUSE based devices.

Cost Considerations

I’ve only considered the options I could find for storage backed by S3 - my main in head reason for this was to avoid transfer charges. There will still be charges for the GET, PUT and LIST requests so I also need to consider the cost implications of using S3 to assist with persistent storage.

What Next?

The next step is to take a deeper look at the options on offer and try to discover if there is a workable solution for persistent storage already available. I also want to run benchmarks against some sample configurations to see if the options perform under real world requirements. Finally, I will run some numbers to check if the costs stack up. By the time I’ve finished, the official Amazon offering may well be available - if not I shall document my findings!

Updated on 07 February 2019
First published by Chris Anderton on 12 June 2008
© Chris Anderton 2019
"EC2 Persistent Storage for the Impatient?" by Chris Anderton at TheWebFellas is licensed under a Creative Commons Attribution 4.0 International License.