I have an amazon s3 bucket that has tens of thousands of filenames in it. What's the easiest way to get a text file that lists all the filenames in the bucket?
-
As alluded to by jldupont's comment on the answer provided by vdaubry, `boto.s3.bucketlistresultset.BucketListResultSet` addresses the "tens of thousands of filenames" condition mentioned in the question. – chb May 29 '13 at 09:01
-
2Be aware that for buckets with a very large number of objects, say millions or billions, the coding/scripting approaches below will not work well. You should instead enable S3 Inventory and retrieve an inventory report. – jarmod Jan 31 '20 at 16:36
31 Answers
I'd recommend using boto. Then it's a quick couple of lines of python:
from boto.s3.connection import S3Connection
conn = S3Connection('access-key','secret-access-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
print(key.name.encode('utf-8'))
Save this as list.py, open a terminal, and then run:
$ python list.py > results.txt

- 6,750
- 3
- 39
- 84

- 3,019
- 2
- 18
- 15
-
3If you get: boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden Make sure the user policy for the Access/Secret key has access to the S3. – topherjaynes May 27 '14 at 23:44
-
1I got 403 error, and i had to follow this instructions in order to make it to work: http://stackoverflow.com/a/22462419/1143558 – Ljubisa Livac Jul 12 '16 at 09:42
-
1
-
5
-
AWS CLI
Documentation for aws s3 ls
AWS have recently release their Command Line Tools. This works much like boto and can be installed using sudo easy_install awscli
or sudo pip install awscli
Once you have installed, you can then simply run
aws s3 ls
Which will show you all of your available buckets
CreationTime Bucket
------------ ------
2013-07-11 17:08:50 mybucket
2013-07-24 14:55:44 mybucket2
You can then query a specific bucket for files.
Command:
aws s3 ls s3://mybucket
Output:
Bucket: mybucket
Prefix:
LastWriteTime Length Name
------------- ------ ----
PRE somePrefix/
2013-07-25 17:06:27 88 test.txt
This will show you all of your files.

- 51,422
- 11
- 85
- 111
-
23Add the `--recursive` flag to see all objects under the specified directory – Chris Bloom Nov 04 '15 at 16:56
-
2Is there a way to parse the names out? I am looking to make a list of files in a s3 bucket to enumerate over. – Casey Dec 23 '19 at 20:14
-
in addition, s3 encodes the filenames to be used as URLs, these are just raw filenames.. – Casey Dec 23 '19 at 21:00
-
-
Can someone use this to list files in a public s3 bucket that they do not own? – Lucas Bustamante Sep 01 '22 at 20:41
s3cmd is invaluable for this kind of thing
$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket

- 3,077
- 2
- 30
- 35
-
1`s3cmd` returns the filenames sorted by date. Is there any way I can make it return say only those files which have been added after `2015-10-23 20:46`? – SexyBeast Nov 19 '15 at 14:22
-
Note that if the filenames have spaces this has a small glitch but I don't have the awk-foo to fix it – Colin D Jun 19 '18 at 01:49
Be carefull, amazon list only returns 1000 files. If you want to iterate over all files you have to paginate the results using markers :
In ruby using aws-s3
bucket_name = 'yourBucket'
marker = ""
AWS::S3::Base.establish_connection!(
:access_key_id => 'your_access_key_id',
:secret_access_key => 'your_secret_access_key'
)
loop do
objects = Bucket.objects(bucket_name, :marker=>marker, :max_keys=>1000)
break if objects.size == 0
marker = objects.last.key
objects.each do |obj|
puts "#{obj.key}"
end
end
end
Hope this helps, vincent

- 13,380
- 9
- 75
- 96

- 11,369
- 7
- 54
- 76
-
8boto handles paging, see https://github.com/boto/boto/blob/develop/boto/s3/bucket.py – jldupont Jul 06 '12 at 13:11
-
Thanks for this, I had a hard time finding how to set the marker :1: – Adrian Magdas Jun 13 '14 at 12:43
Update 15-02-2019:
This command will give you a list of all buckets in AWS S3:
aws s3 ls
This command will give you a list of all top-level objects inside an AWS S3 bucket:
aws s3 ls bucket-name
This command will give you a list of ALL objects inside an AWS S3 bucket:
aws s3 ls bucket-name --recursive
This command will place a list of ALL inside an AWS S3 bucket... inside a text file in your current directory:
aws s3 ls bucket-name --recursive | cat >> file-name.txt

- 6,557
- 2
- 19
- 26
-
This works but isn't really what I need. It just lists all of the "top-level" prefixes. Is there a way to get all objects in a bucket, prefixes and all? – rinogo Feb 14 '19 at 17:07
-
Update: The [answer by @sysuser](https://stackoverflow.com/a/36148415/114558) is what I needed. – rinogo Feb 14 '19 at 17:11
-
@rinogo It does not fit your needs maybe... but it works and that is what counts here. Its fits other ppl's need as a correct answer. – Khalil Gharbaoui Feb 14 '19 at 17:23
-
Like I said, it works - thank you! But it doesn't answer OP's question. OP asked for a way to "[list] all the filenames in the bucket". This only lists top-level objects, not *all* objects. – rinogo Feb 14 '19 at 17:28
-
2Aha but that is not hard to do. Just add '--recursive' to the command. I'll add it to my answer thanks for pointing that out – Khalil Gharbaoui Feb 15 '19 at 09:21
There are couple of ways you can go about it. Using Python
import boto3
sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
s3 = sesssion.resource('s3')
bucketName = 'testbucket133'
bucket = s3.Bucket(bucketName)
for obj in bucket.objects.all():
print(obj.key)
Another way is using AWS cli for it
aws s3 ls s3://{bucketname}
example : aws s3 ls s3://testbucket133

- 1,563
- 1
- 14
- 21

- 638
- 9
- 13
-
1if aws is already configured, one can replace lines 2 and 3 with `s3 = boto3.resource('s3')` – sinapan Mar 04 '19 at 16:59
-
If you have the environment variables placed, you do not need to use the variables in the `session` method. `AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']` `AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']` – Flavio Aug 01 '19 at 12:19
For Scala developers, here it is recursive function to execute a full scan and map the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {
def scan(acc:List[T], listing:ObjectListing): List[T] = {
val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
val mapped = (for (summary <- summaries) yield f(summary)).toList
if (!listing.isTruncated) mapped.toList
else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
}
scan(List(), s3.listObjects(bucket, prefix))
}
To invoke the above curried map()
function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f()
you want to apply to map each object summary in the second parameter list.
For example
val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))
will return the full list of (key, owner)
tuples in that bucket/prefix
or
map(s3, "bucket", "prefix")(s => println(s))
as you would normally approach by Monads in Functional Programming

- 981
- 9
- 10
-
There's a bug with this code. If The initial scan is truncated, the final return will only return `mapped.toList` without any of the previous `acc` – Mark Wang Apr 01 '17 at 06:49
-
Thanks - note that AmazonS3Client should now be just AmazonS3. – Anthony Holland Dec 18 '18 at 13:14
First make sure you are on an instance terminal
and you have all access
of S3
in IAM
you are using. For example I used an ec2 instance.
pip3 install awscli
Then Configure aws
aws configure
Then fill outcredantials ex:-
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json (or just press enter)
Now, See all buckets
aws s3 ls
Store all buckets name
aws s3 ls > output.txt
See all file structure in a bucket
aws s3 ls bucket-name --recursive
Store file structure in each bucket
aws s3 ls bucket-name --recursive > file_Structure.txt
Hope this helps.

- 7,088
- 3
- 45
- 53
After zach I would also recommend boto, but I needed to make a slight difference to his code:
conn = boto.connect_s3('access-key', 'secret'key')
bucket = conn.lookup('bucket-name')
for key in bucket:
print key.name

- 25,977
- 6
- 66
- 70
-
3The modification was necessary because the original code did not work at a time. – Datageek Jul 23 '13 at 09:54
-
1`conn.lookup` returns `None` instead of throwing a `S3ResponseError(NoSuchBucket)` error – Ehtesh Choudhury Feb 21 '14 at 20:14
aws s3api list-objects --bucket bucket-name
For more details see here - http://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html

- 1,072
- 1
- 13
- 30
For Python's boto3 after having used aws configure
:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('name')
for obj in bucket.objects.all():
print(obj.key)

- 2,042
- 1
- 23
- 26
AWS CLI can let you see all files of an S3 bucket quickly and help in performing other operations too.
To use AWS CLI follow steps below:
- Install AWS CLI.
- Configure AWS CLI for using default security credentials and default AWS Region.
To see all files of an S3 bucket use command
aws s3 ls s3://your_bucket_name --recursive
Reference to use AWS cli for different AWS services: https://docs.aws.amazon.com/cli/latest/reference/

- 1,335
- 17
- 22
In Java you can get the keys using ListObjects (see AWS documentation)
FileWriter fileWriter;
BufferedWriter bufferedWriter;
// [...]
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("myprefix");
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary :
objectListing.getObjectSummaries()) {
// write to file with e.g. a bufferedWriter
bufferedWriter.write(objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());

- 13,380
- 9
- 75
- 96

- 31,313
- 12
- 80
- 83
-
There is one more simple API available, which takes bucket name and lists the objects present in it. ObjectListing objects = s3client.listObjects(bucketName) javadoc link is given below, http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#listObjects-java.lang.String- – Rajesh Jul 18 '16 at 17:01
I know its old topic, but I'd like to contribute too.
With the newer version of boto3 and python, you can get the files as follow:
import os
import boto3
from botocore.exceptions import ClientError
client = boto3.client('s3')
bucket = client.list_objects(Bucket=BUCKET_NAME)
for content in bucket["Contents"]:
key = content["key"]
Keep in mind that this solution not comprehends pagination.
For more information: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects

- 2,593
- 1
- 13
- 29

- 853
- 9
- 23
Here's a way to use the stock AWS CLI to generate a diff
-able list of just object names:
aws s3api list-objects --bucket "$BUCKET" --query "Contents[].{Key: Key}" --output text
(based on https://stackoverflow.com/a/54378943/53529)
This gives you the full object name of every object in the bucket, separated by new lines. Useful if you want to diff between the contents of an S3 bucket and a GCS bucket, for example.

- 15,644
- 5
- 38
- 46
Code in python using the awesome "boto" lib. The code returns a list of files in a bucket and also handles exceptions for missing buckets.
import boto
conn = boto.connect_s3( <ACCESS_KEY>, <SECRET_KEY> )
try:
bucket = conn.get_bucket( <BUCKET_NAME>, validate = True )
except boto.exception.S3ResponseError, e:
do_something() # The bucket does not exist, choose how to deal with it or raise the exception
return [ key.name.encode( "utf-8" ) for key in bucket.list() ]
Don't forget to replace the < PLACE_HOLDERS > with your values.

- 877
- 7
- 13
Alternatively you can use Minio Client aka mc. Its Open Source and compatible with AWS S3. It is available for Linux, Windows, Mac, FreeBSD.
All you have do do is to run mc ls command for listing the contents.
$ mc ls s3/kline/ [2016-04-30 13:20:47 IST] 1.1MiB 1.jpg [2016-04-30 16:03:55 IST] 7.5KiB docker.png [2016-04-30 15:16:17 IST] 50KiB pi.png [2016-05-10 14:34:39 IST] 365KiB upton.pdf
Note:
- s3: Alias for Amazon S3
- kline: AWS S3 bucket name
Installing Minio Client Linux Download mc for:
- 64-bit Intel from https://dl.minio.io/client/mc/release/linux-amd64/mc
- 32-bit Intel from https://dl.minio.io/client/mc/release/linux-386/mc
- 32-bit ARM from https://dl.minio.io/client/mc/release/linux-arm/mc
$ chmod 755 mc $ ./mc --help
Setting up AWS credentials with Minio Client
$ mc config host add mys3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12
Note: Please replace mys3 with alias you would like for this account and ,BKIKJAA5BMMU2RHO6IBB, V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12 with your AWS ACCESS-KEY and SECRET-KEY
Hope it helps.
Disclaimer: I work for Minio

- 1,944
- 1
- 12
- 20
The below command will get all the file names from your AWS S3 bucket and write into text file in your current directory:
aws s3 ls s3://Bucketdirectory/Subdirectory/ | cat >> FileNames.txt

- 5,100
- 13
- 43
- 62

- 71
- 3
function showUploads(){
if (!class_exists('S3')) require_once 'S3.php';
// AWS access info
if (!defined('awsAccessKey')) define('awsAccessKey', '234567665464tg');
if (!defined('awsSecretKey')) define('awsSecretKey', 'dfshgfhfghdgfhrt463457');
$bucketName = 'my_bucket1234';
$s3 = new S3(awsAccessKey, awsSecretKey);
$contents = $s3->getBucket($bucketName);
echo "<hr/>List of Files in bucket : {$bucketName} <hr/>";
$n = 1;
foreach ($contents as $p => $v):
echo $p."<br/>";
$n++;
endforeach;
}

- 13,380
- 9
- 75
- 96

- 43
- 5
You can list all the files, in the aws s3 bucket using the command
aws s3 ls path/to/file
and to save it in a file, use
aws s3 ls path/to/file >> save_result.txt
if you want to append your result in a file otherwise:
aws s3 ls path/to/file > save_result.txt
if you want to clear what was written before.
It will work both in windows and Linux.

- 1,002
- 1
- 13
- 21
In javascript you can use
s3.listObjects(params, function (err, result) {});
to get all objects inside bucket. you have to pass bucket name inside params (Bucket: name).

- 205
- 1
- 2
- 12
For getting full links run
aws s3 ls s3://bucket/ | awk '{print $4}' | xargs -I{} echo "s3://bucket/{}"

- 1,613
- 11
- 20
In PHP you can get complete list of AWS-S3 objects inside specific bucket using following call
$S3 = \Aws\S3\S3Client::factory(array('region' => $region,));
$iterator = $S3->getIterator('ListObjects', array('Bucket' => $bucket));
foreach ($iterator as $obj) {
echo $obj['Key'];
}
You can redirect output of the above code in to a file to get list of keys.

- 13,380
- 9
- 75
- 96

- 2,428
- 3
- 17
- 16
Simplified and updated version of the Scala answer by Paolo:
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.model.{ListObjectsRequest, ObjectListing, S3ObjectSummary}
def buildListing(s3: AmazonS3, request: ListObjectsRequest): List[S3ObjectSummary] = {
def buildList(listIn: List[S3ObjectSummary], bucketList:ObjectListing): List[S3ObjectSummary] = {
val latestList: List[S3ObjectSummary] = bucketList.getObjectSummaries.toList
if (!bucketList.isTruncated) listIn ::: latestList
else buildList(listIn ::: latestList, s3.listNextBatchOfObjects(bucketList))
}
buildList(List(), s3.listObjects(request))
}
Stripping out the generics and using the ListObjectRequest generated by the SDK builders.

- 13,380
- 9
- 75
- 96

- 11
- 3
# find like file listing for s3 files
aws s3api --profile <<profile-name>> \
--endpoint-url=<<end-point-url>> list-objects \
--bucket <<bucket-name>> --query 'Contents[].{Key: Key}'

- 5,114
- 1
- 56
- 53
-
3Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its long-term value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Toby Speight Dec 04 '17 at 15:03
Use plumbum to wrap the cli and you will have a clear syntax:
import plumbum as pb
folders = pb.local['aws']('s3', 'ls')

- 1,434
- 2
- 14
- 31
please try this bash script. it uses curl command with no need for any external dependencies
bucket=<bucket_name>
region=<region_name>
awsAccess=<access_key>
awsSecret=<secret_key>
awsRegion="${region}"
baseUrl="s3.${awsRegion}.amazonaws.com"
m_sed() {
if which gsed > /dev/null 2>&1; then
gsed "$@"
else
sed "$@"
fi
}
awsStringSign4() {
kSecret="AWS4$1"
kDate=$(printf '%s' "$2" | openssl dgst -sha256 -hex -mac HMAC -macopt "key:${kSecret}" 2>/dev/null | m_sed 's/^.* //')
kRegion=$(printf '%s' "$3" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kDate}" 2>/dev/null | m_sed 's/^.* //')
kService=$(printf '%s' "$4" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kRegion}" 2>/dev/null | m_sed 's/^.* //')
kSigning=$(printf 'aws4_request' | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kService}" 2>/dev/null | m_sed 's/^.* //')
signedString=$(printf '%s' "$5" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kSigning}" 2>/dev/null | m_sed 's/^.* //')
printf '%s' "${signedString}"
}
if [ -z "${region}" ]; then
region="${awsRegion}"
fi
# Initialize helper variables
authType='AWS4-HMAC-SHA256'
service="s3"
dateValueS=$(date -u +'%Y%m%d')
dateValueL=$(date -u +'%Y%m%dT%H%M%SZ')
# 0. Hash the file to be uploaded
# 1. Create canonical request
# NOTE: order significant in ${signedHeaders} and ${canonicalRequest}
signedHeaders='host;x-amz-content-sha256;x-amz-date'
canonicalRequest="\
GET
/
host:${bucket}.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:${dateValueL}
${signedHeaders}
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
# Hash it
canonicalRequestHash=$(printf '%s' "${canonicalRequest}" | openssl dgst -sha256 -hex 2>/dev/null | m_sed 's/^.* //')
# 2. Create string to sign
stringToSign="\
${authType}
${dateValueL}
${dateValueS}/${region}/${service}/aws4_request
${canonicalRequestHash}"
# 3. Sign the string
signature=$(awsStringSign4 "${awsSecret}" "${dateValueS}" "${region}" "${service}" "${stringToSign}")
# Upload
curl -g -k "https://${baseUrl}/${bucket}" \
-H "x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" \
-H "x-amz-Date: ${dateValueL}" \
-H "Authorization: ${authType} Credential=${awsAccess}/${dateValueS}/${region}/${service}/aws4_request,SignedHeaders=${signedHeaders},Signature=${signature}"

- 1
- 2
public static Dictionary<string, DateTime> ListBucketsByCreationDate(string AccessKey, string SecretKey)
{
return AWSClientFactory.CreateAmazonS3Client(AccessKey,
SecretKey).ListBuckets().Buckets.ToDictionary(s3Bucket => s3Bucket.BucketName,
s3Bucket => DateTime.Parse(s3Bucket.CreationDate));
}

- 13,380
- 9
- 75
- 96

- 21
-
2I guess this is Java prototype or something but please explain it. – Doncho Gunchev Jun 19 '12 at 01:00
This is an old question but the number of responses tells me many people hit this page.
The easiest way I found is to just use the built in AWS console for creating an inventory. It's easy to set up but the first CSV file can take up to 48 hours to show up. After that you can create either a daily or weekly output to a bucket of your choosing.

- 746
- 7
- 25
The EASIEST way to get a very usable text file is to download S3 Browser http://s3browser.com/ and use the Web URLs Generator to produce a list of complete link paths. It is very handy and involves about 3 clicks.
-Browse to Folder
-Select All
-Generate Urls
Best of luck to you.

- 5
- 2