How do I take a backup of aws ec2 instance/ephemeral storage?

Question

I have my db kept at /mnt, using ephemeral storage that comes with ec2 instance. To take the backup using ec2 api tools we need a volume id, but in the aws console I can find the volume id of only the 8gb root storage.

What should I do if want the backup of ephemeral storage? Is there any alternative for backing up instance storage?

Hey @Smita, did you managed to get backup of ec2 instance storage on ebs? (I am in almost the same issue atm) — Jadav Bheda, Apr 29 '15 at 14:00

score 32 · Accepted Answer · edited Apr 13 '17 at 12:13

First and foremost, you should never store anything of lasting value on ephemeral storage in Amazon EC2, except if you know exactly what you are doing and are prepared to always have point in time backups etc. - your question seems to indicate that you might be mistaken about the concept of ephemeral storage, the respective difference between Amazon EC2 Instance Storage an Amazon EBS and the significant implications regarding data safety and backup requirements:

Ephemeral storage will be lost on stop/start cycles and can generally go away, so you definitely don't want to put anything of lasting value there, i.e. only put temporary data there you can afford to lose or rebuild easily, like a swap file or strictly temporary data in use during computations. Of course you might store huge indexes there for example, but must be prepared to rebuild these after the storage has been cleared for whatever reason (instance reboot, hardware failure, ...).

That's one of the many reasons Eric Hammond excellently summarized in You Should Use EBS Boot Instances on Amazon EC2), which outlines the history of and differences between the two storage concepts and assesses the few remaining possible benefits of ephemeral storage (mainly being plentiful and free).

Problem/Solution

These explanations should clarify why you are unable to backup the ephemeral storage volumes with a mechanism that solely applies to EBS volumes (i.e. EBS snapshots). Accordingly, you can backup the former via regular operating system level backup tool of your choice, with Duplicity being a popular choice optionally facilitating Amazon S3 for example, as addressed in my answer to Easiest to use backup software for live linux server.

thanks for clarification. and alternative solution is really helpful :) — Smita, May 26 '12 at 13:05

score 7 · Answer 2 · answered Feb 14 '14 at 08:59

Ephemeral storage, or instance storage, as-is, is like a /tmp folder, the contents of which disappear after a reboot. Of course, ephemeral drive contents aren't destroyed on a soft reboot, but they should be treated as if they were, since you can't realistically control or predict when your instance decides to die.

This has already been pointed out.

What I'd like to point out, is that if you create and configure your AMIs appropriately, you can still use the ephemeral storage to drastically improve (read) throughput, so long as you also keep EBS drives for the actual storage.

What I'm using at the moment is Linux (Ubuntu Tahr) instances with bcache. This is mainly because bcache kernel support is relatively new (IIRC, first one with bcache was 3.10), and you'd definitely want as recent a kernel as possible. Also, Tahr is the next LTS version of Ubuntu, and it's final when my project is close to launch ;)

Bcache, in its default configuration, allows you to benefit from the read speed of the ephemeral storage while giving you the persistence of EBS: It takes a fast cache device (ephemeral SSD) and uses it to speed up a slow device (EBS), writing through the cache device (that is, writing simultaneously to ephemeral cache and EBS).

This means that should an instance crash or otherwise be stopped, you can still mount the EBS volume directly without the cache, and access all your data as you would otherwise using only EBS volumes. You can also reconfigure the now wiped ephemeral devices and re-configure them as a cache to the EBS to get back to enjoying very fast reads and seeks.

My particular setup is two EBS devices, raided in stripe mode using mdadm + two ephemeral SSD devices also raided in the same manner. Then I've configured them with bcache, using the ephemeral array as the cache, and the EBS array as the "backup" device. The EBS drives can be any size, and you can always expand them (a bit tricky with EC2, because you have to create a snapshot of the current EBS volumes, and then create new larger ones based on that snapshot — you can't resize an existing EBS volume).

Of course, you'll have to create a script that runs inside your instance at startup to configure the ephemeral storage and attach it as a cache device on your EBS-backed backup device. I encourage reading up on, and experimenting with, mdadm and bcache.

For the record, testing with the Cassandra stress tool, I get better read performance with EBS volumes bcached with the ephemeral drives than I do with just striping the ephemeral drives. This is because of the algorithm used in bcache, which is very clever.

Using the ephemeral drives as a cache also reduced network traffic and is cost-effective, as it reduces I/O on EBS, and thereby your monthly bill.

Also note the different types of caching bcache provides:

Write back: Use the SSD as read/write device, and only write to the backup device when pages need to be evicted from the cache. This is not useful for EC2 ephemeral setups, as it will render your backup device useless on a crash or stop.
Write through: All writes go to both cache and backup. This ensures that the backup device is always as up-to-date as the cache device, and it can always be used without the cache device. Useful for EC2.
Write around: All writes go directly to the backup device, and are not written to the cache device until a read request happens for that data some time in the future. Only reads are cached on the cache device. This is as safe as write through, and is useful if you know that your writes are not likely to be read in the near future. This avoids filling the cache device with data that isn't requested often, so that there's more space for what is requested data. A couple of examples could be a file upload server, a system where you write a lot of logging data, etc. If you know that your entire data set is significantly larger than the ephemeral storage size, this is most likely to be the most efficient option in a large numer of use cases.

I know it's 8 years since this post, but what a great post! You can now resize EBS disks on the fly and there are a bunch of new instance types with a wide range of instance storage (r5d, m5d, r5ad, m5ad, r6id, m6id, r6gd, m6gd). These options make your post truly valuable to improve performance. The only hard part now is writing the cloud-init script to setup bcache automatically on reboot. Got anything to share? — site80443, Aug 07 '22 at 21:13
@site80443 Sure :) As you say, it's been 8 years, so I don't remember much. It was a startup I was minority co-owner on, which no longer exists, so I'm confident sharing the code I found. It's the "not getting paid, exploring and learning, don't have time, wanna have fun too" sort of code, so I expect some head scratching and bepuzzlement, even face-palming. But have a look around. I've redacted it as well as I could, though it shouldn't really matter as all servers are now dead, but please me know if you find anything not redacted: https://github.com/DanielSmedegaardBuus/ec2-craycray-redacted — DanielSmedegaardBuus, Aug 09 '22 at 05:37
While I use neither node, nginx nor cassandra (I work with LAMP) I found a lot of useful stuff in that repo, especially the scripts and the comments. Thank you very much! Also, I've emailed you what looked like information left unredacted. — site80443, Aug 10 '22 at 23:14

score 2 · Answer 3 · answered Jun 02 '13 at 23:26

If you are able to configure a software RAID mirror, you can attach an EBS-backed disk to the instance, configure a mirror, then wait for replication to complete. I have successfully used this method to move "ephemeral" data into EBS after I had already created the instance (and I did not want to shut down and reboot).

Once you have the data on EBS, back up with EBS images.

This method works particularly well when you have multiple copies of the data running on different identical instances, but you only need one of them persisted to EBS (in my case, using Couchbase server, the CB data was on ephemeral drives, but I had one of the instances mirrored to EBS such that all the data on my cluster ended up in EBS).

score 1 · Answer 4 · answered Jan 28 '15 at 16:52

Any file-level backup solution (not based on EBS snapshots) can back-up your ephemeral storage. That said, you should consider when to use ephemeral storage, and have good reason to use it for persistent data. For certain applications, like Cassandra, this is the recommended configuration. In that case your backup solution will mostly dump the data from the ephemeral storage to an EBS volume that will be snapshotted or directly to S3. In some cases you can define replication and make sure all data in the ephemeral device is also replicated to EBS volumes.

How do I take a backup of aws ec2 instance/ephemeral storage?

4 Answers4

Problem/Solution

Linked