AWS S3 Sync very slow when copying to large directories

Question

When syncing data to an empty directory in S3 using AWS-CLI, it's almost instant. However, when syncing to a large directory (several million folders), it takes a very long time before even starting to upload / sync the files.

Is there an alternative method? It looks like it's trying to take account of all files in an S3 directory before syncing - I don't need that, and uploading the data without checking beforehand would be fine.

Syncing 100mb to a new directory takes almost no time, but syncing to a heavily used directory can take hours - hopefully there is an alternative! — King Dedede, Jan 24 '17 at 18:38
One alternative that works for me rclone (https://rclone.org). I didn't do exact benchmarks, but aws cli sync took hours to find the 30 files out of >5000 that had to be synced. rclone did the same in minutes. — mvtango, Aug 23 '19 at 07:02
@PhilipKirkbride: I don't see why. Unless the OP is using `--delete`, the only files to consider / list are the local ones. — Pierre D, Jan 30 '20 at 02:44
BTW, I wish `aws s3 [ls|cp|sync]` had options `--min min-key` and `--max max-key`. When we wrote java equivalents to these commands (many years ago), we made good use of S3 listing `Marker`. See a Python example of the same idea in https://stackoverflow.com/a/51372405/758174. — Pierre D, Jan 30 '20 at 02:48
@PierreD just pointing out that is expected as is confirmed by excepted answer, all files in bucket are enumerated. — Philip Kirkbride, Jan 30 '20 at 06:09
@PhilipKirkbride: what I mean is that, to me, it is *unexpected* given that: 1. this is clearly avoidable and suboptimal, and 2. usually `awscli` is well implemented and fast. In other words, I don't contest the fact that the current implementation of `aws s3 sync` is slow in this case, but I am _surprised_ by it. You make it sound like it is _logical_, which it is not. — Pierre D, Jan 31 '20 at 18:47
if you don't need md5 checks of every file, you can use the `--size-only` switch per [this answer](https://stackoverflow.com/a/42787035/3281039) — user108569, Mar 16 '22 at 16:26

score 29 · Accepted Answer · answered Jan 24 '17 at 18:46

29

The sync command will need to enumerate all of the files in the bucket to determine whether a local file already exists in the bucket and if it is the same as the local file. The more documents you have in the bucket, the longer it's going to take.

If you don't need this sync behavior just use a recursive copy command like:

aws s3 cp --recursive . s3://mybucket/

and this should copy all of the local files in the current directory to the bucket in S3.

answered Jan 24 '17 at 18:46

garnaat

44,310
7
123
103

11

Danger! Using `aws s3 cp` could end up being expensive as you'll be uploading your files over and over if you run this copy multiple times. A better solution would likely be to keep using `aws s3 sync` but increase the `max-concurrent-requests` setting: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests – Firefishy Aug 16 '20 at 21:31

score 6 · Answer 2 · edited May 18 '21 at 11:51

6

If you use the unofficial s3cmd from S3 Tools, you can use the --no-check-md5 option while using sync to disable the MD5 sums comparison to significantly speed up the process.

--no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.

Source: https://s3tools.org/usage

Example: s3cmd --no-check-md5 sync /directory/to/sync s3://mys3bucket/

edited May 18 '21 at 11:51

Tamlyn

22,122
12
111
127

answered May 12 '20 at 18:32

spoonsearch

123
3
9

aws s3 sync --no-check-md5 ... s3://... returns the error: Unknown options: --no-check-md5 – lambruscoAcido Jun 22 '20 at 19:56
1

The documentation says "you'll need s3cmd version 2.0 or newer". Check your version. – spoonsearch Jun 24 '20 at 07:28
1

`s3cmd` is a tool by https://s3tools.org/s3cmd and `aws` is the awscli by AWS themselves. `aws s3 sync --size-only` can help large syncs a bit. – Firefishy Aug 16 '20 at 21:24

AWS S3 Sync very slow when copying to large directories

2 Answers2