Ephemeral storage, or instance storage, as-is, is like a /tmp folder, the contents of which disappear after a reboot. Of course, ephemeral drive contents aren't destroyed on a soft reboot, but they should be treated as if they were, since you can't realistically control or predict when your instance decides to die.
This has already been pointed out.
What I'd like to point out, is that if you create and configure your AMIs appropriately, you can still use the ephemeral storage to drastically improve (read) throughput, so long as you also keep EBS drives for the actual storage.
What I'm using at the moment is Linux (Ubuntu Tahr) instances with bcache. This is mainly because bcache kernel support is relatively new (IIRC, first one with bcache was 3.10), and you'd definitely want as recent a kernel as possible. Also, Tahr is the next LTS version of Ubuntu, and it's final when my project is close to launch ;)
Bcache, in its default configuration, allows you to benefit from the read speed of the ephemeral storage while giving you the persistence of EBS: It takes a fast cache device (ephemeral SSD) and uses it to speed up a slow device (EBS), writing through the cache device (that is, writing simultaneously to ephemeral cache and EBS).
This means that should an instance crash or otherwise be stopped, you can still mount the EBS volume directly without the cache, and access all your data as you would otherwise using only EBS volumes. You can also reconfigure the now wiped ephemeral devices and re-configure them as a cache to the EBS to get back to enjoying very fast reads and seeks.
My particular setup is two EBS devices, raided in stripe mode using mdadm + two ephemeral SSD devices also raided in the same manner. Then I've configured them with bcache, using the ephemeral array as the cache, and the EBS array as the "backup" device. The EBS drives can be any size, and you can always expand them (a bit tricky with EC2, because you have to create a snapshot of the current EBS volumes, and then create new larger ones based on that snapshot — you can't resize an existing EBS volume).
Of course, you'll have to create a script that runs inside your instance at startup to configure the ephemeral storage and attach it as a cache device on your EBS-backed backup device. I encourage reading up on, and experimenting with, mdadm and bcache.
For the record, testing with the Cassandra stress tool, I get better read performance with EBS volumes bcached with the ephemeral drives than I do with just striping the ephemeral drives. This is because of the algorithm used in bcache, which is very clever.
Using the ephemeral drives as a cache also reduced network traffic and is cost-effective, as it reduces I/O on EBS, and thereby your monthly bill.
Also note the different types of caching bcache provides:
- Write back: Use the SSD as read/write device, and only write to the backup device when pages need to be evicted from the cache. This is not useful for EC2 ephemeral setups, as it will render your backup device useless on a crash or stop.
- Write through: All writes go to both cache and backup. This ensures that the backup device is always as up-to-date as the cache device, and it can always be used without the cache device. Useful for EC2.
- Write around: All writes go directly to the backup device, and are not written to the cache device until a read request happens for that data some time in the future. Only reads are cached on the cache device. This is as safe as write through, and is useful if you know that your writes are not likely to be read in the near future. This avoids filling the cache device with data that isn't requested often, so that there's more space for what is requested data. A couple of examples could be a file upload server, a system where you write a lot of logging data, etc. If you know that your entire data set is significantly larger than the ephemeral storage size, this is most likely to be the most efficient option in a large numer of use cases.