4

I have an observation that may help others working with S3 and a question below. The example code here is in Groovy using JetS3t Java lib but the concepts are applicable to any programming language.

I had found a lot of documentation here on Slashdot and elsewhere that claimed that S3 does not have a concept of subdirectories within buckets. This is mostly true. When you want to delete files you will find that you must first find them using:

  //assume we are looking for all files in 'stuff' directory
  files = s3.listObjects(bucket, 'stuff/', null)

Now if you you delete those files, you'll still be left with something that looks very much like a subdirectory in the bucket. You'll still see 'stuff/' listed. So this caused me to question whether it was true that there really were no subdirectories. It turns out it is true there aren't real subdirectories, but some file is masquerading as a subdir and showing in listing. With a little spelunking I determined that this is another S3 object that has the key name with the special string _$folder$ appended to the key. So you can delete this by doing the following (assuming the stuff example above):

   s3.deleteObject(bucket, 'stuff_$folder$')

Now you will no longer see any subdirectory listed for stuff in that bucket. Although I haven't tested this, I presume stuff/ folder must already be empty before trying to delete the key 'stuff_$folder$'. What surprises me is that in all the posts here this is never mentioned so anyone attempting to delete an entire subdirectory probably has the subdirectory itself still hanging around!

If you go back to my original listObjects call and do this instead:

   files = s3.listObjects(bucket, 'stuff', null) //note, no trailing slash

You will see the stuff_$folder$ returned in the results. My problem with that is you may also get other objects that start with the key "stuff" but are not contained in the "subdirectory". So you have to be careful. So my preference is to pass 'stuff/' as the key and then deal with the 'stuff_$folder_' object separately.

This leads me to a final question. I cannot seem to get a clear explanation about what the final parameter in the listObjects(bucket, key, delimiter) call means. What exactly is a "delimiter". It doesn't seem to mean "file separator" (as in '/'). I've searched and can't seem to find an example that illustrates what this means or how it is used. I want to know since if there is anyway to improve the utility and flexibility of listObjects I'd like to know. Can someone provide an example that illustrates the usage and meaning of the delimiter parameter? I'm sure its something simple and I just can't find a good example of it.

Rich Sadowsky
  • 966
  • 1
  • 12
  • 22

1 Answers1

1

Delimiter is a clumsy name. It makes more sense if you consider it a suffix. From the S3 documentation - http://aws.amazon.com/releasenotes/Amazon-S3/213 or if you prefer a slightly different explanation http://www.bucketexplorer.com/documentation/amazon-s3--search-on-objects-in-bucket.html

Groups of keys that share a common prefix terminated by a special delimiter can now be rolled-up by that prefix for the purposes of listing. This allows applications to browse their keys hierarchically, much like how you would navigate through directories in a filesystem.

For example, if you had a bucket that contained the following keys (named with embedded slash delimiters to simulate directories) photos/2006/index.html photos/2006/January/img0001.jpg ... photos/2006/January/img0999.jpg photos/2006/February/img1000.jpg ... A list query with Prefix="photos/2006/" and Delimiter="/" would return the keys and "subdirectories" at the photos/2006 level (index.html, January, February, ...) but would not include any of the .jpg keys at deeper levels.

So think of it as a suffice. Your delimiter could be .html, .jpg or something like that.

Pete - MSFT
  • 4,249
  • 1
  • 21
  • 38