Compress billions of files in S3 bucket

Question

We have lots of files in S3 (>1B), I'd like to compress those to reduce storage costs. What would be a simple and efficient way to do this?

Thank you

Alex

Is your aim simply to reduce costs? Are you familiar with [Object Storage Classes – Amazon S3](https://aws.amazon.com/s3/storage-classes/)? You can reduce the cost of storage, but it is a trade-off with durability and access speed. For example, the Glacier Deep Archive storage class can reduce storage costs by 95%, but data is not immediately accessible. This would be a LOT simpler than compressing files. Can you tell us more about how these files are used? — John Rotenstein, Feb 21 '21 at 21:43
@JohnRotenstein Yes, storage costs is the main driver here. But I need to keep data accessible. — AlexV, Feb 23 '21 at 08:57
@Dunedan You mean use S3 batch with custom Lamda, that will compress the files? — AlexV, Feb 23 '21 at 08:58

score 2 · Accepted Answer · answered Feb 23 '21 at 11:42

Amazon S3 cannot compress your data.

You would need to write a program to run on an Amazon EC2 instance that would:

An alternative is to use Storage Classes:

If the data is infrequently accessed, use S3 Standard - Infrequent Access -- this is available immediately and is cheaper as long as data is accessed less than once per month
Glacier is substantially cheaper but takes some time to restore (speed of restore is related to cost)

1 Answers1