9

So I'm aware that Amazon S3 doesn't really have directories. My question is: does this make it impossible to reliably get the last-modified timestamp of a "directory" in S3?

I know you can get the last-modified date of a file, as in this question.

I say "reliably" because it would be possible to define the latest last-modified timestamp of a file inside a directory as the last-modified timestamp of the directory. But that's not really accurate, since if a file inside a directory gets deleted, it wouldn't register as a change to that directory (indeed the deletion might cause the last-modified date to go backwards in time).

We're using boto to scrape S3.

Community
  • 1
  • 1
Eli Rose
  • 6,788
  • 8
  • 35
  • 55
  • The best you will do is the creation date of the oldest file in the directory. One option moving forward would be to add some sort of anchor file when a directory is created that you never delete. You could retroactively create the anchor files in directories based on the current oldest file and your data could improve overtime. – Chris Montanaro Sep 28 '15 at 19:43

2 Answers2

7

If its really important for you to know this, you could develop a solution using the S3 event notifications. Each time a file is put or deleted from a folder you can have either an SNS or Lamba event get fired, and you could use that information to update a table/log someplace where this information is kept for use when you need it.

Probably not a ton of work to do it, but if its critical to know, it is an avenue worth exploring.

http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

E.J. Brennan
  • 45,870
  • 7
  • 88
  • 116
6

Since what we label as a directory is just part of the object name, there is no creation time, modify time, etc. since it does not really exist as an entity on its own. The object has a path, and when you add '/' to the name, client presentation applications treat that as a separator, split the name, and make it look like a path. Like you suggested, there is no directory, and this is where that concept really is different than a traditional file system and how end users interact with it.

I suggest asking what you are trying to do and why the timestamp of the directory is important. E.J. Brennan suggests what you may be trying to do and is not a bad idea for the case he mentions. There is likely a different way to skin your cat.

cgseller
  • 3,875
  • 2
  • 19
  • 21