2

I have the following folder structure that I would like to create in S3,

/demo/app/a.txt

Via Console:

demo and app using "Create Folder", followed by a.txt file upload

Via CLI:

aws s3 sync . s3://<my-bucket>/, where . (current dir) has demo folder

--

Now when I run,

aws s3 ls s3://<my-bucket> --recursive, the result is quite interesting/puzzling!

Output:

# created & uploaded from console
demo/
demo/app/
demo/app/a.txt

# from CLI
demo/app/a.txt

Clearly, there aren't any prefixes/objects for demo/ and demo/app/ created using CLI upload.

--

This answer helped me understand that "Create Folder" from console results in creating a 0-byte file when the folder is empty. But this isn't the case when uploaded via CLI.

So, how do I mimic the 0-byte file behavior for CLI uploads? In other words, match the prefixes!

The other use case is, navigating across directories from a browser will not work as the prefixes aren't available. For e.g.

  1. Uploaded from console: https://<domain>/demo/ shows app directory.
  2. Uploaded from CLI: https://<domain>/demo/ results in NoSuchKeyError

--

Note: I'm using a CloudFront distribution to access S3 data (if that helps)!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Prathap
  • 178
  • 1
  • 1
  • 9
  • What happens when you use a trailing forward slash? .. e.g. `aws s3 ls s3:///` (Note last forward slash) – Jamie_D Apr 21 '20 at 19:36
  • @Jamie_D, It's the same output. – Prathap Apr 21 '20 at 19:45
  • Something you should really understand that while S3 has an appearance (and a confusingly named "create folder" in the UI) of directories, it's actually a flat object store. S3 contains one single item "demo/app/a.txt". "demo" and "app" do not exist on their own. – jordanm Apr 21 '20 at 20:47
  • 1
    You can't do this for CLI uploads. If you really need it (most people don't), then script the upload to create 0-sized objects representing the intermediate folder structure you want. – jarmod Apr 21 '20 at 21:31
  • @jordanm, completely agree. I just used the notion of "folder" to explain the problem. With the console upload, coz of the presence of 0-byte objects, it gives the impression of hops from `/` to get to `/demo/app/a.txt`, which is sort of helpful while listing on a browser. – Prathap Apr 22 '20 at 00:34
  • @jarmod, thanks! Could you share a sample script you're referring to? *Note: I wouldn't want to see these (as distinct) files in the bucket; in other words, achieve the exact same thing as done from/by AWS console.* – Prathap Apr 22 '20 at 00:37
  • @Prathap have provided script in an answer. – jarmod Apr 22 '20 at 15:10

2 Answers2

2

Amazon S3 is a flat object-storage system. It is not a filesystem and it does not have the concept of folders or directories. Rather, the Key (filename) of an object contains the full path of the object.

The easiest way to use S3 is to pretend that folders exist, but not actually create them. For example, you could copy a file to S3 like this:

aws s3 cp a.txt s3://my-bucket/demo/app/a.txt

This will work successfully, even if there is no directory called demo or app, because directories/folders do not exist.

Instead, Amazon S3 provides the concept of a CommonPrefix, which you'll see at the bottom of a ListObjects() API call. This returns a list of folder-like names that are separated by a delimiter (which is normally /). This provides programmatic equivalence to a directory, without actually needing them to exist.

If you wish to present a series of hierarchical directories, use the list of CommonPrefixes to build that view. This will work even when there are no zero-length files, because S3 looks at the Keys of the objects, not actual directories.

Here's some examples:

aws s3 cp a.txt s3://my-bucket/demo/app/a.txt
upload: ./a.txt to s3://my-bucket/demo/app/a.txt                  

aws s3api list-objects-v2 --bucket my-bucket

{
    "Contents": [
        {
            "Key": "demo/app/a.txt",
            "LastModified": "2020-04-22T01:11:20+00:00",
            "ETag": "\"802776735eb3ddcf03962ae47e08ed13\"",
            "Size": 211,
            "StorageClass": "STANDARD"
        }
    ]
}

aws s3api list-objects-v2 --bucket my-bucket --delimiter '/'

{
    "CommonPrefixes": [
        {
            "Prefix": "demo/"
        }
    ]
}

aws s3api list-objects-v2 --bucket jstack-b --delimiter '/' --prefix 'demo/'
{
    "CommonPrefixes": [
        {
            "Prefix": "demo/app/"
        }
    ]
}

Notice how the commands at the end provide a delimiter, so a list of CommonPrefixes are returned. This is a way that you could step through a list of directories (that don't exist).

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Oh this is interesting. I believe this might serve the purpose of navigating through prefixes to list in the browser. However, I'll need to try this out as an enhancement, since I have a mixed-mode setup, i.e. files uploaded through CLI & Console, thereby requiring to modify the current logic. Thanks for the detailed answer & possibly the right approach to navigate. – Prathap Apr 24 '20 at 12:06
1

If you really need to have a folder structure present in S3 that mirrors a local folder structure then you could try some variant of the following shell scripting and awscli on Linux or Mac:

  1. list all local folders
  2. transform that list into something matching S3 naming conventions
  3. create a zero byte file at each folder name in S3

List local folders:

find . -type d

Example output:

.
./dogs
./dogs/cute
./dogs/fierce
./cats
./cats/white
./cats/black

Transform that list to remove . and so that ./dogs/cute becomes dogs/cute:

find . -type d | grep -v '^\.$' | sed 's/^\.\///g'

Example output:

dogs
dogs/cute
dogs/fierce
cats
cats/white
cats/black

Finally put it all together and create the zero byte file at each folder:

find . -type d \
    | grep -v '^\.$' \
    | sed 's/^\.\///g' \
    | xargs -L 1 -I % \
    aws s3api put-object --bucket mybucket --key %/ --content-length 0

Check what this resulted in in S3:

aws s3 ls s3://mybucket --recursive

Results:

2020-04-21 21:00:05          0 cats/black/
2020-04-21 21:00:05          0 cats/white/
2020-04-21 21:00:04          0 cats/
2020-04-21 21:00:03          0 dogs/cute/
2020-04-21 21:00:04          0 dogs/fierce/
2020-04-21 21:00:02          0 dogs/
jarmod
  • 71,565
  • 16
  • 115
  • 122
  • Awesome!! This does exactly what I was expecting; in other words create the 0-length files similar to how S3 console does. Thanks for the detailed answer. Could you explain `xargs -L 1 -I % aws s3api put-object --bucket mybucket --key %/ --content-length 0` part of the answer? I have very little scripting knowledge, sorry! – Prathap Apr 24 '20 at 12:01
  • You can use xargs to run a given command (`aws s3api`) against a number of lines of input (the desired folder names). xargs takes input from stdin which in this case is the modified find results for folders. The `-L 1` option says use one line (i.e. folder) at a time, and `-I %` means substitute the line content into `%` in the following command. So it executes `aws s3api put-object ...` multiple times, once per folder, and sets the `--key` parameter to `foldername/`. – jarmod Apr 24 '20 at 15:57
  • Great, thanks again! Going ahead with this answer as the accepted one. However, I must say that the answer provided by @John Rotenstein also achieves the pseudo directory navigation without requiring 0-byte objects. Thanks to both! :) – Prathap Apr 25 '20 at 10:45