30

When I run the command in the terminal back to back, it doesn't sync the second time. Which is great! It shouldn't. But, if I run my build process and run aws s3 sync programmatically, back to back, it syncs all the files both times, as if my build process is changing something differently the second time.

Can't figure out what might be happening. Any ideas?

My build process is basically pug source/ --out static-site/ and stylus -c styles/ --out static-site/styles/

Costa Michailidis
  • 7,691
  • 15
  • 72
  • 124

3 Answers3

26

According to this - http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

S3 sync compares the size of the file and the last modified timestamp to see if a file needs to be synced.

In your case, I'd suspect the build system is resulting in a newer timestamp even though the file size hasn't changed?

Naraen
  • 3,240
  • 2
  • 22
  • 20
  • 3
    There is an `--exact-timestamps` option where same-sized items will be ignored when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version. – John Rotenstein Apr 21 '17 at 03:28
  • Hmmm... doesn't really help. And to fix this I'd need to interrupt pug's compiling command to run cmp or something. I can't imagine how to start doing that. I think I'll just forego this item. – Costa Michailidis Apr 21 '17 at 04:54
23

AWS CLI sync:

A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.

--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.

You want the --size-only option which looks only at the file size not the last modified date. This is perfect for an asset build system that will change the last modified date frequently but not the actual contents of the files (I'm running into this with webpack builds where things like fonts kept syncing even though the file contents were identical). If you don't use a build method that incorporates the hash of the contents into the filename it might be possible to run into problems (if build emits same sized file but with different contents) so watch out for that.

I did manually test adding a new file that wasn't on the remote bucket and it is indeed added to the remote bucket with --size-only.

Cymen
  • 14,079
  • 4
  • 52
  • 72
  • 5
    Hm... but what if I change the word "lump" to "pump" in an html file or some tiny change like that, that won't change file size? – Costa Michailidis Dec 20 '18 at 00:05
  • 3
    @Costa No, it won't. But I would recommend using a build system that appends hashes to the filenames. At least that works great for say CSS and JavaScript files. In my projects, I usually only have one root `index.html` file so I'd just sync that as part of my deploy command. But if you have a lot of HTML files you'd want to work around that by syncing them differently. – Cymen Dec 20 '18 at 01:12
  • 7
    Gotcha. That's a fine strategy : ) I wish S3 just stored a hash of the file contents as a way to check for changes. I wonder if I could implement that on my end... o _ O – Costa Michailidis Dec 20 '18 at 15:57
  • 1
    @Costa I agree -- that would be the best way forward if S3 would have that option similar to rsync and other syncing tools. Doing yourself is an interesting idea and seems like it would work (just have to decide where to store the map of filename to hash -- ie put in git repo or put that up on s3 separately or only deploy from one server and keep it local to that or ...). – Cymen Dec 20 '18 at 17:41
9

This article is a bit dated but i'll contribute nonetheless for folks arriving here via google.

I agree with checked answer. To add additional context, AWS S3 functionality is different than standard linux s3 in a number of ways. In Linux, an md5hash can be computed to determine if a file has changed. S3 does not do this, so it can only determine based on size and/or timestamp. What's worse, AWS does not preserve timestamp when transferring either way, so timestamp is ignored when syncing to local and only used when syncing to s3.

Gary Morris
  • 91
  • 1
  • 1