110

I am using AWS CLI to list the files in an AWS S3 bucket using the following command (aws s3 ls):

aws s3 ls s3://mybucket --recursive --human-readable --summarize

This command gives me the following output:

2013-09-02 21:37:53   10 Bytes a.txt
2013-09-02 21:37:53  2.9 MiB foo.zip
2013-09-02 21:32:57   23 Bytes foo/bar/.baz/a
2013-09-02 21:32:58   41 Bytes foo/bar/.baz/b
2013-09-02 21:32:57  281 Bytes foo/bar/.baz/c
2013-09-02 21:32:57   73 Bytes foo/bar/.baz/d
2013-09-02 21:32:57  452 Bytes foo/bar/.baz/e
2013-09-02 21:32:57  896 Bytes foo/bar/.baz/hooks/bar
2013-09-02 21:32:57  189 Bytes foo/bar/.baz/hooks/foo
2013-09-02 21:32:57  398 Bytes z.txt

Total Objects: 10
   Total Size: 2.9 MiB

However, this is my desired output:

a.txt
foo.zip
foo/bar/.baz/a
foo/bar/.baz/b
foo/bar/.baz/c
foo/bar/.baz/d
foo/bar/.baz/e
foo/bar/.baz/hooks/bar
foo/bar/.baz/hooks/foo
z.txt

How can I omit the date, time and file size in order to show only the file list?

Abdullah Khawer
  • 4,461
  • 4
  • 29
  • 66
Borealis
  • 8,044
  • 17
  • 64
  • 112

13 Answers13

147

You can't do this with just the aws command, but you can easily pipe it to another command to strip out the portion you don't want. You also need to remove the --human-readable flag to get output easier to work with, and the --summarize flag to remove the summary data at the end.

Try this:

aws s3 ls s3://mybucket --recursive | awk '{print $4}'

Edit: to take spaces in filenames into account:

aws s3 ls s3://mybucket --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//'
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Mark B
  • 183,023
  • 24
  • 297
  • 295
  • Partially correct answer - it will not work if there's a whitespace in filename. Also `awk '{print $5}'`, not `$4` – Michal Gasek Apr 23 '16 at 17:29
  • 5
    @MichalGasek if you remove the `--human-readable` flag like I specified, then it's $4, not $5. – Mark B Apr 23 '16 at 17:31
  • Right, it's $4 then. It still should deal with whitespaces in filenames... otherwise surprises come up. – Michal Gasek Apr 23 '16 at 17:36
  • @MichalGasek I invite you to post an answer that deals with spaces in filenames. – Mark B Apr 23 '16 at 17:41
  • 3
    Don't think it's worth another answer really. Piping through perl and matching after 3rd whitespace could for example work fine here: `aws s3 ls s3://mybucket --recursive | perl -ne '($key)=$_=~/^[\d\-]+\s+[\d\:]+\s+\d+\s(.+?)$/g; print "$key\n";'` – Michal Gasek Apr 23 '16 at 17:49
  • 7
    Alternate non-awk solution: `aws s3 ls s3://mybucket --recursive | tr -s ' ' | cut -d' ' -f4` – LateralFractal Oct 27 '16 at 04:32
  • 1
    I can't verify if this works for recursive, but since the "simple" version won't work for spaces in filename, it seems like a fragile solution, and the other is needlessly complex. Instead, cut on chars, which should be robust enough until the CLI output format changes: `aws s3 ls s3://mybucket | cut -c32-` (optionally add recursive & verify it still works) – michael Nov 03 '17 at 09:51
  • 1
    Not all heroes wear capes – Ste Oct 05 '20 at 17:14
  • the "edit" version's command will mess up with space symbols, i.e. if your file has 2 or more spaces it will trim them into 1. This has wasted a lot of time for me. – Arman Yeghiazaryan Oct 18 '22 at 23:34
41

Use the s3api with jq (AWS docu aws s3api list-objects):

This mode is always recursive.

$ aws s3api list-objects --bucket "bucket" | jq -r '.Contents[].Key'
a.txt
foo.zip
foo/bar/.baz/a
[...]

You can filter sub directories by adding a prefix (here foo directory). The prefix must not start with an /.

$ aws s3api list-objects --bucket "bucket" --prefix "foo/" | jq -r '.Contents[].Key'
foo/bar/.baz/a
foo/bar/.baz/b
foo/bar/.baz/c
[...]

jq Options:

  • -r = Raw Mode, no quotes in output
  • .Contents[] = Get Contents Object Array Content
  • .Key = Get every Key Field (does not produce a valid JSON Array, but we are in raw mode, so we don't care)

Addendum:

You can use pure AWS CLI, but the values will be seperated by \x09 = Horizontal Tab (AWS: Controlling Command Output from the AWS CLI - Text Output Format)

$ aws s3api list-objects --bucket "bucket" --prefix "foo/" --query "Contents[].Key" --output text
foo/bar/.baz/a   foo/bar/.baz/b   foo/bar/.baz/c   [...]

AWS CLI Options:

  • --query "Contents[].Key" = Query Contents Object Array and get every Key inside
  • --output text = Output as Tab delimited Text with now Quotes

Addendum based on Guangyang Li Comment:

Pure AWS CLI with New Line:

$ aws s3api list-objects --bucket "bucket" --prefix "foo/" --query "Contents[].{Key: Key}" --output text
foo/bar/.baz/a
foo/bar/.baz/b
foo/bar/.baz/c
[...]
notes-jj
  • 1,437
  • 1
  • 20
  • 33
  • 2
    very nice. or `aws s3api list-buckets | jq -r '.Buckets[].Name'` – f_i Mar 31 '19 at 06:44
  • 3
    I like the pure AWS CLI one and actually you can do it with `--query 'Contents[].{Key: Key}'`. Then it will be one record per line. – Guangyang Li Dec 05 '19 at 19:13
  • using `s3api` exports a JSON list and cuts specific values via `jq`, this means that if you have tons of data, you also need to have a big RAM, otherwise it will be overflown and the script will eventually fail. The `s3 ls` is a much better option. – Arman Yeghiazaryan Oct 18 '22 at 23:38
12

A simple filter would be:

aws s3 ls s3://mybucket --recursive | perl -pe 's/^(?:\S+\s+){3}//'

This will remove the date, time and size. Left only the full path of the file. It also works without the recursive and it should also works with filename containing spaces.

Walf
  • 8,535
  • 2
  • 44
  • 59
Celogeek San
  • 129
  • 1
  • 2
5

Simple Way

aws s3 ls s3://mybucket --recursive --human-readable --summarize|cut -c 29-
slm
  • 15,396
  • 12
  • 109
  • 124
Tech Support
  • 948
  • 11
  • 9
  • 1
    currently, for me, `aws s3 ls` outputs such that you'd want to cut on `-c32`, not `-c29`; not sure if it's my data or a change in output format. (I don't actually have subfolders.) This is true for `--human-readable` or plain default output; columns are the same place. But really, there's no need for human-readable in this case. And in either case you'd want to omit the `--summarize`. In short, `aws s3 ls s3://mybucket | cut -c32-` (and `--recursive` only if desired) – michael Nov 03 '17 at 09:47
  • 1
    Note that all the other answers here that attempt to cut based on spaces (awk, cut, whatever) are not going to work if there are spaces in the filenames. – michael Nov 03 '17 at 09:48
  • This is the cleanest way to do it (as for michael with -c32) – Yannick Wurm Apr 08 '20 at 21:05
4

EDIT: After considering MultiDev's comment, that the previous solution won't work with objects that have spaces in them. I used s3api instead of s3

aws s3api list-objects --bucket mybucket --prefix myprefix --query 'Contents[].Key' | jq -rc '.[]'

prefix is optional

Using jq to get the raw elements (keys) from the returned array

Use something like --query 'Contents[].{Key: Key, Size: Size}' to get more info, then format the output further with jq


OLD Solution: aws s3 ls s3://mybucket --recursive | rev | cut -d" " -f1 | rev

I would suggest not depending on the spacing and fetching the 4th field.

You technically want the last field regardless of which position it was in.

So it's safer to use rev to your advantage...
rev reverses the string input char by char
so when you pipe the aws s3 ls out put to rev you have everything reversed, including the positions of the fields, so the last field always becomes the first field.
Instead of figuring out where the last field would be, you just rev, get first, then rev again because the characters in the field would be in reverse as well.

Example:

2013-09-02 21:32:57 23 Bytes foo/bar/.baz/a becomes a/zab./rab/oof setyB 32 75:23:12 20-90-3102

then cut -d" " -f1 would retrieve the first field a/zab./rab/oof

then rev again to get foo/bar/.baz/a

Shoukry
  • 59
  • 2
3

My Solution

List only files recursively using aws cli.

aws s3 ls s3://myBucket --recursive | awk 'NF>1{print $4}' | grep .

grep . - Clear empty lines.


Example: aws s3 ls s3://myBucket

                           PRE f5c10c1678e8484482964b8fdcfe43ad/
                           PRE f65b94ad31734135a61a7fb932f7054d/
                           PRE f79b12a226b542dbb373c502bf125ffb/
                           PRE logos/
                           PRE test/
                           PRE userpics/
2019-05-14 10:56:28       7754 stage.js

Solution: aws s3 ls s3://myBucket --recursive | awk 'NF>1{print $4}' | grep .

stage.js
nsantana
  • 1,992
  • 1
  • 16
  • 17
2

Simple command would be

aws s3 ls s3://mybucket --recursive --human-readable --summarize |cut -d ' ' -f 8

If you need the timestamp, just update command field values.

skipper21
  • 2,165
  • 2
  • 16
  • 13
2

An S3 bucket may not only have files but also files with prefixes. In case you use --recursive it will not only list the files but also just the prefixes. In case you do not care about the prefixes and just the files within the bucket or just the prefixes within the bucket, this should work.

aws s3 ls s3://$S3_BUCKET/$S3_OPTIONAL_PREFIX/ --recursive | awk '{ if($3 >0) print $4}'

awk's $3 is the size of the file in case of prefix it would be 0. It could also be that the file is empty so it would skip empty files as well.

Anjan Biswas
  • 7,746
  • 5
  • 47
  • 77
2

If your files don't have spaces, then this is the easiest way to do it:

aws s3 ls s3://mybucket  | cut -c32-

Output is:

1.txt.gz
2.txt.gz
3.txt.gz

Instead of:

2021-12-15 23:05:44         36 1.txt.gz
2021-12-15 23:05:45         37 2.txt.gz
2021-12-15 23:05:46         39 3.txt.gz
Mostafa Wael
  • 2,750
  • 1
  • 21
  • 23
1

For only the file names, I find the easiest to be:

aws s3 ls s3://path/to/bucket/ | cut -d " " -f 4

This will cut the returned output at the spaces (cut -d " ") and return the fourth column (-f 4), which is the list of file names.

Michael Silverstein
  • 1,653
  • 15
  • 17
1
How to display only files from aws s3 ls command?

1. Basic command

$ aws s3 ls s3://bucket --recursive

output :

2021-02-10 15:29:02          0 documents/
2021-02-10 15:29:02         18 documents/data/data.txt
2021-03-15 23:35:12          0 documents/data/my code.txt


2. To get only keys from s3 bucket containing spaces also.

$ aws s3 ls s3://bucket --recursive | awk '{ $1=$2=$3=""; print $0}' | cut -c4-

output : 

documents/
documents/data/data.txt
documents/data/my code.txt

3. Removing "documents/" from result

$ aws s3 ls s3://bucket --recursive | awk '$0 !~ /\/$/ { $1=$2=$3=""; print $0}' | cut -c4-

output :

documents/data/data.txt
documents/data/my code.txt
linux_dev
  • 19
  • 4
  • the second clause's suggested bash command will mess up with space symbols, i.e. if your file has 2 or more spaces it will trim them into 1. This has wasted a lot of time from me. – Arman Yeghiazaryan Oct 18 '22 at 22:26
0

You can run following command for listing bucket names without additional information

aws s3api list-buckets --query "Buckets[].[Name]" --output text
Eralper
  • 6,461
  • 2
  • 21
  • 27
-1

Its just grep to filter by starting symbol. "^-" means line starts with '-' symbol. Directories, on other hand, start with the letter 'd'

ls -Al | grep "^-"
Savvasenok
  • 31
  • 6