0

I'm creating periodic snapshots of my EBS volume using a Scheduled Cron expression rule (thanks, John C).

My data is all binary, and I suspect that the automatic compression AWS performs on my data - will actually enlarge the resulting snapshots.

Is there a way to instruct AWS to not employ compression when creating snapshots (so I could compare the snapshot's size with/without compression)?

Note:
Creating an Amazon EBS Snapshot seems to indicate that using compression is mandatory.

boardrider
  • 5,882
  • 7
  • 49
  • 86
  • 2
    Have you seen a situation in which the snaphots were actually larger? – jarmod Mar 22 '18 at 17:15
  • The only way (afaik) to learn the actual size of the snapshots is via the [daily cost and usage report](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-reports-gettingstarted-turnonreports.html). Divide the cost by the rate to determine how much is stored. Delete an older snapshot of the same volume and you will observe the next newest snapshot's cost go up by some fraction of the cost you eliminated by the deletion, as the costs are reallocated. You'll also see newer snapshots of completely unchanged volumes costing nothing. I can't imagine that you are overpaying. – Michael - sqlbot Mar 22 '18 at 23:00
  • I'm not sure I could test that, @jarmod, without being able to stop the AWS automatic compression. However, as a general rule, since compression incurs overhead in maintaining its tables and pointers, if my data is binary, which to AWS would look like a random stream, then almost by definition the 'compressed' data would take more space than the uncompressed data. – boardrider Mar 22 '18 at 23:44
  • 1
    Wow. Now that _is_ complicated. Thanks, @Michael. – boardrider Mar 22 '18 at 23:45
  • Binary data does not necessarily mean "uncompressable". – Matt Houser Mar 22 '18 at 23:57
  • Related conversation suggesting that EBS snapshots are no longer compressed: https://forums.aws.amazon.com/message.jspa?messageID=737524 – jarmod Mar 23 '18 at 12:41
  • True, @Matt, but see https://stackoverflow.com/a/4716351/1656850. – boardrider Mar 24 '18 at 22:56

3 Answers3

4

You have no control over the compression used for EBS snapshots.

EBS snapshots are incremental (except for the first snapshot). That data is compressed based on AWS's own heuristics. You have no visibility into the actual compressed data's size.

When you're looking at an EBS snapshot, the snapshot's "size" will always be reported as the originating EBS volume's size, regardless of the actual size of the snapshot.

Matt Houser
  • 33,983
  • 6
  • 70
  • 88
  • Thanks for the answer, @Matt. I'm more concerned with the payment for the actual size of the snapshots: does Amazon charge me for the size of the actual snapshot, or for the size of the EBS volume's the snapshot is backing up? – boardrider Apr 04 '18 at 16:00
  • You only pay for the data saved. So redundant blocks are not charged. – Matt Houser Apr 04 '18 at 16:03
0

I don't think EBS snapshots are now compressed (I am not sure if they were earlier) and I could not find any reference to compression in AWS documentation as well. That is why the size of initial snapshot is same as the size of the volume. And after first snapshot, other snapshots are incremental so only the blocks on the device that have changed or added after last snapshot are saved in the new snapshot.

You can refer the blog on how the ebs snapshots backup & restore work.

RajBedi
  • 11
  • 3
0

In referencing the CUR database which is at any given time reporting to two days prior, you can pull associated cost metrics including actual snapshot size. AWS DOES NOT MAKE THIS EASY. AWS provides mechanisms that will calculate change between snapshots in order to provide cost metrics through use of tags. For the oldest or what should be the largest snapshot, this is compressed and per the AWS engineer I am chatting with right now, we should expect to see standard compression ratios. He told me the compression should not be much different than zipping. So, when I look at my first snapshot, I am seeing volume variances between 99.99% and 0% with an average of 86% compression for a fairly large Oracle EC2 instance with 20+ volumes.