21

I have an amazon s3 bucket that has tens of thousands of filenames in it. What's the easiest way to get a list of all file or text file that lists all the filenames in the bucket?

I have tried with listObject(), but It seems that it only list 1000 files.

amazon-s3-returns-only-1000-entries-for-one-bucket-and-all-for-another-bucket-u S3-Provider-does-not-get-more-than-1000-items-from-bucket

--> Listing Keys Using the AWS SDK for PHP but in aws docs I read

max-keys - string - Optional - The maximum number of results returned by the method call. The returned list will contain no more results than the specified value, but may return fewer. The default value is 1000.

AWS DOC FOR list_objects

Is there some way to list it all and print it to a text file using AWS PHP SDK ?

Possible repeat : quick-way-to-list-all-files-in-amazon-s3-bucket

I have reposted the question because am looking for the solution in php.

Code :

$s3Client = S3Client::factory(array('key' => $access, 'secret' => $secret));

$response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'));
$files = $response->getPath('Contents');
$request_id = array();
foreach ($files as $file) {
    $filename = $file['Key'];
    print "\n\nFilename:". $filename;

 }
Community
  • 1
  • 1
Hitesh
  • 4,098
  • 11
  • 44
  • 82
  • Note that in newer versions of the PHP SDK the client must be created like this instead: `$s3Client = S3Client::factory(array('credentials' => array('key' => $access, 'secret' => $secret)));` – TheStoryCoder Mar 09 '16 at 15:05
  • @TheStoryCoder : Thanks for information – Hitesh Mar 10 '16 at 03:11

3 Answers3

22

To get more than 1000 objects, you must make multiple requests using the Marker parameter to tell S3 where you left off for each request. Using the Iterators feature of the AWS SDK for PHP makes it easier to get all of your objects, because it encapsulates the logic of making multiple API requests. Try this:

$objects = $s3Client->getListObjectsIterator(array(
    'Bucket' => $bucket,
    'Prefix' => 'files/'
));

foreach ($objects as $object) {
    echo $object['Key'] . "\n";
}

With latest PHP SDK (as of March 2016) the code must be written like this instead:

$objects = $s3Client->getIterator('ListObjects', array(
    'Bucket' => $bucket,
    'Prefix' => 'files/'
));
TheStoryCoder
  • 3,403
  • 6
  • 34
  • 64
Jeremy Lindblom
  • 6,437
  • 27
  • 30
  • you are little late, I too found the same but Thanks a lot, I did do some work around in this to make it work for me but as I found that each request only list 1000 and if you are doing any search (lets assume for only pdf files, it searches from those 1000 results) so, still I can not be sure about the count but I have found this is the only trick to do this – Hitesh Mar 06 '14 at 05:44
  • @jermy : which one u will recommend your answer or mine ? and why ? – Hitesh Mar 19 '14 at 10:11
  • I would say that mine would be better for the following reasons: 1.) It requires fewer requests. Given that `n` is the total number of S3 objects in your bucket with the `files/` prefix, my solution requires ceil(n/1000) requests. Yours always requires exactly 52 requests, whether or not they are all needed. 2.) Mine only requires one loop instead of 2 loops with nested loops. 3.) If you had more than 1000 files that started with a particular letter, your solution only captures the first 1000 per letter. Mine will always get all of your objects. – Jeremy Lindblom Mar 19 '14 at 20:23
  • I think to get the full URL you can just do this: sprintf('http://%s.s3.amazonaws.com/%s', $bucket_name, $object['Key']) – supersan Sep 27 '15 at 22:04
9

Use Paginator to get all files

    $client = new S3Client([
        'version' => AWS_S3_CLIENT_FACTORY_VERSION,
        'region' => AWS_S3_CLIENT_FACTORY_REGION,

    ]);
    $objects = $client->getPaginator('ListObjects', ['Bucket' => "my-bucket"]);
    foreach ($objects as $listResponse) {
        $items = $listResponse->search("Contents[?starts_with(Key,'path/to/folder/')]");
        foreach($items as $item) {
            echo $item['Key'] . PHP_EOL;
        }
    }

To get all files change the search to:

$listResponse->search("Contents[*]");
Dror Dromi
  • 157
  • 1
  • 4
1

Below code is just one trick, work around for this problem, I have pointed to my CDN bucket folder which have lot of folder alphabetically (a-z & A-Z), so I just made a multiple requests to make it list all files,

This code is to list mp4, pdf, png, jpg or all files

//letter range a-z and A-Z
$az = range('a', 'z');
$AZ = range('A', 'Z');
//To get the total no of files
$total = 0;
//text file
$File = "CDNFileList.txt"; 

//getting dropdownlist values 
$selectedoption = $_POST['cdn_dropdown_list'];
$file_ext = '';
if ($selectedoption == 'pdf'){
    $file_ext = 'PDF DOCUMENTS';
}else if(($selectedoption == 'jpg')){
    $file_ext = 'JPEG IMAGES';
}else if(($selectedoption == 'png')){
    $file_ext = 'PNG IMAGES';
}else if($selectedoption == 'mp4'){
    $file_ext = 'MP4 VIDEOS';
}else if($selectedoption == 'all'){
    $file_ext = 'ALL CONTENTS';
}
//Creating table
echo "<table style='width:300px' border='1'><th colspan='2'><b>List of $file_ext</b></th><tr><td><b>Name of the File</b></td><td><b>URL of the file</b></td></tr>";

foreach($az as $value){
        $response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'.$value));
        $files = $response->getPath('Contents');
        $file_list = array();
        foreach ($files as $file) {
                $filename = $file['Key'];
                if ( 'all' == ($selectedoption)){
                        $file_path_parts = pathinfo($filename);
                        $file_name = $file_path_parts['filename'];
                        echo "<tr><td>$file_name</td><td><a href = '";
                        echo $baseUrl.$filename;
                        echo "' target='_blank'>";
                        echo $baseUrl.$filename;
                        echo "</a></td></tr>";
                        $filename = $baseUrl.$filename.PHP_EOL; 
                        array_push($file_list, $filename);
                        $total++;
                }else{
                    $filetype = strtolower(substr($filename, strrpos($filename, '.')+1));
                    if ($filetype == ($selectedoption)){
                        $file_path_parts = pathinfo($filename);
                        $file_name = $file_path_parts['filename'];
                        echo "<tr><td>$file_name</td><td><a href = '";
                        echo $baseUrl.$filename;
                        echo "' target='_blank'>";
                        echo $baseUrl.$filename;
                        echo "</a></td></tr>";
                        $filename = $baseUrl.$filename.PHP_EOL; 
                        array_push($file_list, $filename);
                        $total++;
                    }
                }
        }
}

foreach($AZ as $value){
        $response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'.$value));
        $files = $response->getPath('Contents');
        $file_list = array();
        foreach ($files as $file) {
            $filename = $file['Key'];
            if ( 'all' == ($selectedoption)){
                    $file_path_parts = pathinfo($filename);
                    $file_name = $file_path_parts['filename'];
                    echo "<tr><td>$file_name</td><td><a href = '";
                    echo $baseUrl.$filename;
                    echo "' target='_blank'>";
                    echo $baseUrl.$filename;
                    echo "</a></td></tr>";
                    $filename = $baseUrl.$filename.PHP_EOL; 
                    array_push($file_list, $filename);
                    $total++;
            }else{
                $filetype = strtolower(substr($filename, strrpos($filename, '.')+1));
                if ($filetype == ($selectedoption)){
                    $file_path_parts = pathinfo($filename);
                    $file_name = $file_path_parts['filename'];
                    echo "<tr><td>$file_name</td><td><a href = '";
                    echo $baseUrl.$filename;
                    echo "' target='_blank'>";
                    echo $baseUrl.$filename;
                    echo "</a></td></tr>";
                    $filename = $baseUrl.$filename.PHP_EOL; 
                    array_push($file_list, $filename);
                    $total++;
                }
            }
        }
}
echo "</table><br/>";
print "\n\nTOTAL NO OF $file_ext ".$total;

This is just a workaround for this problem,Since there is no AWS API to list all the files (more than 1000). hope it helps someone.

Hitesh
  • 4,098
  • 11
  • 44
  • 82
  • I am getting error while i am listing S3 objects.PHP Fatal error: Uncaught Aws\S3\Exception\PermanentRedirectException: AWS Error Code: PermanentRedirect, Status Code: 301, – Abhinav Feb 01 '16 at 19:29
  • AWS Request ID: E07DWDWED, AWS Error Type: client, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint: "images.cser.in.s3.amazonaws.com"., User-Agent: aws-sdk-php2/2.7.27 Guzzle/3.9.3 curl/7.40.0 PHP/5.5.31 ITR – Abhinav Feb 01 '16 at 19:29
  • thrown in /var/www/html/app/s3/vendor/aws/aws-sdk-php/src/Aws/Common/Exception/NamespaceExceptionFactory.php on line 91 – Abhinav Feb 01 '16 at 19:29
  • 1
    please ask the question in SO describing the scenario. If you have already asked the question, share the link – Hitesh Feb 08 '16 at 10:01