66

Is there any Amazon S3 client library for Node.js that allows listing of all files in S3 bucket?

The most known aws2js and knox don't seem to have this functionality.

nab
  • 4,751
  • 4
  • 31
  • 42
  • I would ask the author if he could implement it in aws2js. I think it would be very easy to do and he has been recently active in the project. Or if you are able, implement it yourself. – Fantius Feb 24 '12 at 20:44
  • You can also implement this specific request through their [REST API](http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTBucketGET.html) until there is support in one of the libraries. – Viccari Feb 25 '12 at 01:41

16 Answers16

84

Using the official aws-sdk:

var allKeys = [];
function listAllKeys(marker, cb)
{
  s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
    allKeys.push(data.Contents);

    if(data.IsTruncated)
      listAllKeys(data.NextMarker, cb);
    else
      cb();
  });
}

see s3.listObjects

Edit 2017: Same basic idea, but listObjectsV2( ... ) is now recommended and uses a ContinuationToken (see s3.listObjectsV2):

var allKeys = [];
function listAllKeys(token, cb)
{
  var opts = { Bucket: s3bucket };
  if(token) opts.ContinuationToken = token;

  s3.listObjectsV2(opts, function(err, data){
    allKeys = allKeys.concat(data.Contents);

    if(data.IsTruncated)
      listAllKeys(data.NextContinuationToken, cb);
    else
      cb();
  });
}
Meekohi
  • 10,390
  • 6
  • 49
  • 58
  • 1
    I can get something this to work only when specifying a MaxKeys value in the listObjects parameters – Matt Jun 01 '15 at 21:11
  • Can anyone elaborate on Marker. Looked at the docs and am confused. If I omit it, I just get null for `data`. – kuanb Aug 20 '15 at 14:57
  • The `Marker` is a string that specifies the key to start with when listing objects in a bucket. It is optional: if you omit it you will see keys from the beginning (alphanumerically), so it sounds like there might be some other issue you're running into? – Meekohi Aug 20 '15 at 18:52
  • 1
    As a general tip, I often go back to the HTTP API Reference to verify these things, because the documentation for the javascript SDK is sometimes incomplete or inaccurate: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html – Meekohi Aug 20 '15 at 18:59
  • 3
    @kuanb Indeed, according to the documentation data.Marker will be null given the params above, since Delimiter is missing. Please see my answer below – Ken Lin Jan 20 '16 at 23:32
  • 1
    I have modified the code above if data.NextMarker is undefined then we can use last key name in this field as marker in the subsequent request to get next set of objects. Changes added: From "listAllKeys(data.NextMarker, cb);" To "listAllKeys(data.NextMarker || data.Contents[data.Contents.length-1].Key, cb);" as suggested by Thijs :-) – Rajeev Jayaswal Jul 02 '17 at 06:25
  • Should it be concat instead of push? – Matt Vukas Aug 03 '17 at 04:16
  • @MattVukas either will work. `var a = [1,2,3]; a.push([4,5,6]); // a = [1,2,3,4,5,6]`. I agree `concat` would seem slightly more natural, but this was the "official" AWS answer and is slightly shorter since `push` modifies in place. – Meekohi Aug 03 '17 at 14:06
31

Using AWS-SDK v3 and Typescript

import {
  paginateListObjectsV2,
  S3Client,
  S3ClientConfig,
} from '@aws-sdk/client-s3';

/* // For Deno
import {
  paginateListObjectsV2,
  S3Client,
  S3ClientConfig,
} from "https://deno.land/x/aws_sdk@v3.32.0-1/client-s3/mod.ts"; */

const s3Config: S3ClientConfig = {
  credentials: {
    accessKeyId: 'accessKeyId',
    secretAccessKey: 'secretAccessKey',
  },
  region: 'us-east-1',
};

const getAllS3Files = async (client: S3Client, s3Opts) => {
  const totalFiles = [];
  for await (const data of paginateListObjectsV2({ client }, s3Opts)) {
    totalFiles.push(...(data.Contents ?? []));
  }
  return totalFiles;
};

const main = async () => {
  const client = new S3Client(s3Config);
  const s3Opts = { Bucket: 'bucket-xyz' };
  console.log(await getAllS3Files(client, s3Opts));
};

main();

For AWS-SDK v2 Using Async Generator

Import S3

const { S3 } = require('aws-sdk');
const s3 = new S3();

create a generator function to retrieve all the files list

async function* listAllKeys(opts) {
  opts = { ...opts };
  do {
    const data = await s3.listObjectsV2(opts).promise();
    opts.ContinuationToken = data.NextContinuationToken;
    yield data;
  } while (opts.ContinuationToken);
}

Prepare aws parameter, based on api docs

const opts = {
  Bucket: 'bucket-xyz' /* required */,
  // ContinuationToken: 'STRING_VALUE',
  // Delimiter: 'STRING_VALUE',
  // EncodingType: url,
  // FetchOwner: true || false,
  // MaxKeys: 'NUMBER_VALUE',
  // Prefix: 'STRING_VALUE',
  // RequestPayer: requester,
  // StartAfter: 'STRING_VALUE'
};

Use generator

async function main() {
  // using for of await loop
  for await (const data of listAllKeys(opts)) {
    console.log(data.Contents);
  }
}
main();

thats it

Or Lazy Load

async function main() {
  const keys = listAllKeys(opts);
  console.log(await keys.next());
  // {value: {…}, done: false}
  console.log(await keys.next());
  // {value: {…}, done: false}
  console.log(await keys.next());
  // {value: undefined, done: true}
}
main();

Or Use generator to make Observable function

const lister = (opts) => (o$) => {
  let needMore = true;
  const process = async () => {
    for await (const data of listAllKeys(opts)) {
      o$.next(data);
      if (!needMore) break;
    }
    o$.complete();
  };
  process();
  return () => (needMore = false);
};

use this observable function with RXJS

// Using Rxjs

const { Observable } = require('rxjs');
const { flatMap } = require('rxjs/operators');

function listAll() {
  return Observable.create(lister(opts))
    .pipe(flatMap((v) => v.Contents))
    .subscribe(console.log);
}

listAll();

or use this observable function with Nodejs EventEmitter

const EventEmitter = require('events');

const _eve = new EventEmitter();

async function onData(data) {
  // will be called for each set of data
  console.log(data);
}
async function onError(error) {
  // will be called if any error
  console.log(error);
}
async function onComplete() {
  // will be called when data completely received
}
_eve.on('next', onData);
_eve.on('error', onError);
_eve.on('complete', onComplete);

const stop = lister(opts)({
  next: (v) => _eve.emit('next', v),
  error: (e) => _eve.emit('error', e),
  complete: (v) => _eve.emit('complete', v),
});
nkitku
  • 4,779
  • 1
  • 31
  • 27
16

Here's Node code I wrote to assemble the S3 objects from truncated lists.

var params = {
    Bucket: <yourbucket>,
    Prefix: <yourprefix>,
};

var s3DataContents = [];    // Single array of all combined S3 data.Contents

function s3Print() {
    if (program.al) {
        // --al: Print all objects
        console.log(JSON.stringify(s3DataContents, null, "    "));
    } else {
        // --b: Print key only, otherwise also print index 
        var i;
        for (i = 0; i < s3DataContents.length; i++) {
            var head = !program.b ? (i+1) + ': ' : '';
            console.log(head + s3DataContents[i].Key);
        }
    }
}

function s3ListObjects(params, cb) {
    s3.listObjects(params, function(err, data) {
        if (err) {
            console.log("listS3Objects Error:", err);
        } else {
            var contents = data.Contents;
            s3DataContents = s3DataContents.concat(contents);
            if (data.IsTruncated) {
                // Set Marker to last returned key
                params.Marker = contents[contents.length-1].Key;
                s3ListObjects(params, cb);
            } else {
                cb();
            }
        }
    });
}

s3ListObjects(params, s3Print);

Pay attention to listObject's documentation of NextMarker, which is NOT always present in the returned data object, so I don't use it at all in the above code ...

NextMarker — (String) When response is truncated (the IsTruncated element value in the response is true), you can use the key name in this field as marker in the subsequent request to get next set of objects. Amazon S3 lists objects in alphabetical order Note: This element is returned only if you have delimiter request parameter specified. If response does not include the NextMarker and it is truncated, you can use the value of the last Key in the response as the marker in the subsequent request to get the next set of object keys.

The entire program has now been pushed to https://github.com/kenklin/s3list.

Ken Lin
  • 1,819
  • 21
  • 21
9

In fact aws2js supports listing of objects in a bucket on a low level via s3.get() method call. To do it one has to pass prefix parameter which is documented on Amazon S3 REST API page:

var s3 = require('aws2js').load('s3', awsAccessKeyId, awsSecretAccessKey);    
s3.setBucket(bucketName);

var folder = encodeURI('some/path/to/S3/folder');
var url = '?prefix=' + folder;

s3.get(url, 'xml', function (error, data) {
    console.log(error);
    console.log(data);
});

The data variable in the above snippet contains a list of all objects in the bucketName bucket.

nab
  • 4,751
  • 4
  • 31
  • 42
  • 3
    Although this is set as the right/selected answer, it should be noted that https://github.com/SaltwaterC/aws2js has been deprecated. Upon npm install it informs one that "aws2js is deprecated. Please use aws-sdk." – kuanb Aug 20 '15 at 14:31
6

Published knox-copy when I couldn't find a good existing solution. Wraps all the pagination details of the Rest API into a familiar node stream:

var knoxCopy = require('knox-copy');

var client = knoxCopy.createClient({
  key: '<api-key-here>',
  secret: '<secret-here>',
  bucket: 'mrbucket'
});

client.streamKeys({
  // omit the prefix to list the whole bucket
  prefix: 'buckets/of/fun' 
}).on('data', function(key) {
  console.log(key);
});

If you're listing fewer than 1000 files a single page will work:

client.listPageOfKeys({
  prefix: 'smaller/bucket/o/fun'
}, function(err, page) {
  console.log(page.Contents); // <- Here's your list of files
});
hurrymaplelad
  • 26,645
  • 10
  • 56
  • 76
4

Meekohi provided a very good answer, but the (new) documentation states that NextMarker can be undefined. When this is the case, you should use the last key as the marker.

So his codesample can be changed into:

var allKeys = [];
function listAllKeys(marker, cb) {
  s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
    allKeys.push(data.Contents);
    if(data.IsTruncated)
      listAllKeys(data.NextMarker || data.Contents[data.Contents.length-1].Key, cb);
    else
      cb();
  });
}

Couldn't comment on the original answer since I don't have the required reputation. Apologies for the bad mark-up btw.

victorkt
  • 13,992
  • 9
  • 52
  • 51
4

I am using this version with async/await.
This function will return the content in an array.
I'm also using the NextContinuationToken instead of the Marker.

async function getFilesRecursivelySub(param) {

    // Call the function to get list of items from S3.
    let result = await s3.listObjectsV2(param).promise();

    if(!result.IsTruncated) {
        // Recursive terminating condition.
        return result.Contents;
    } else {
        // Recurse it if results are truncated.
        param.ContinuationToken = result.NextContinuationToken;
        return result.Contents.concat(await getFilesRecursivelySub(param));
    }
}

async function getFilesRecursively() {

    let param = {
        Bucket: 'YOUR_BUCKET_NAME'
        // Can add more parameters here.
    };

    return await getFilesRecursivelySub(param);
}
John Tng
  • 41
  • 2
2

This is an old question and I guess the AWS JS SDK has changed a lot since it was asked. Here's yet another way to do it these days:

s3.listObjects({Bucket:'mybucket', Prefix:'some-pfx'}).
on('success', function handlePage(r) {
    //... handle page of contents r.data.Contents

    if(r.hasNextPage()) {
        // There's another page; handle it
        r.nextPage().on('success', handlePage).send();
    } else {
        // Finished!
    }
}).
on('error', function(r) {
    // Error!
}).
send();
logidelic
  • 1,525
  • 15
  • 42
1

If you want to get list of keys only within specific folder inside a S3 Bucket then this will be useful.

Basically, listObjects function will start searching from the Marker we set and it will search until maxKeys: 1000 as limit. so it will search one by one folder and get you first 1000 keys it find from different folder in a bucket.

Consider i have many folders inside my bucket with prefix as prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.

I want to fetch list of objects (file names) only within prod/2017/05/12/ folder then i will specify prod/2017/05/12/ as my start and prod/2017/05/13/ [your next folder name] as my end and in code i'm breaking the loop when i encounter the end.

Each Keyin data.Contents will look like this.

{      Key: 'prod/2017/05/13/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
       LastModified: 2017-05-13T00:59:02.000Z,
       ETag: '"630b2sdfsdfs49ef392bcc16c833004f94ae850"',
       Size: 134236366,
       StorageClass: 'STANDARD',
       Owner: { } 
 }

Code:

var list = [];

function listAllKeys(s3bucket, start, end) {
  s3.listObjects({
    Bucket: s3bucket,
    Marker: start,
    MaxKeys: 1000,
  }, function(err, data) {
      if (data.Contents) {
        for (var i = 0; i < data.Contents.length; i++) {
         var key = data.Contents[i].Key;    //See above code for the structure of data.Contents
          if (key.substring(0, 19) != end) {
             list.push(key);
          } else {
             break;   // break the loop if end arrived
          }
       }
        console.log(list);
        console.log('Total - ', list.length);      
     }
   });
 }

listAllKeys('BucketName', 'prod/2017/05/12/', 'prod/2017/05/13/');

Output:

[ 'prod/2017/05/12/05/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
  'prod/2017/05/12/05/a36528b9-e071-4b83-a7e6-9b32d6bce6d8.jpg',
  'prod/2017/05/12/05/bc4d6d4b-4455-48b3-a548-7a714c489060.jpg',
  'prod/2017/05/12/05/f4b8d599-80d0-46fa-a996-e73b8fd0cd6d.jpg',
  ... 689 more items ]
Total - 692
Prasanth Jaya
  • 4,407
  • 2
  • 23
  • 33
1

I ended up building a wrapper function around ListObjectsV2, works the same way and takes the same parameters but works recursively until IsTruncated=false and returns all the keys found as an array in the second parameter of the callback function

const AWS = require('aws-sdk')
const s3 = new AWS.S3()

function listAllKeys(params, cb)
{
   var keys = []
   if(params.data){
      keys = keys.concat(params.data)
   }
   delete params['data']

   s3.listObjectsV2(params, function(err, data){
     if(err){
       cb(err)
     } else if (data.IsTruncated) {
       params['ContinuationToken'] = data.NextContinuationToken
       params['data'] = data.Contents
       listAllKeys(params, cb)
     } else {
       keys = keys.concat(data.Contents)
       cb(null,keys)
     }
   })
}
Carlos Rodriguez
  • 611
  • 5
  • 11
0

Here's what I came up with based on the other answers.
You can await listAllKeys() without having to use callbacks.

const listAllKeys = () =>
  new Promise((resolve, reject) => {
    let allKeys = [];
    const list = marker => {
      s3.listObjects({ Marker: marker }, (err, data) => {
        if (err) {
          reject(err);
        } else if (data.IsTruncated) {
          allKeys.push(data.Contents);
          list(data.NextMarker || data.Contents[data.Contents.length - 1].Key);
        } else {
          allKeys.push(data.Contents);
          resolve(allKeys);
        }
      });
    };
    list();
  });

This assumes you've initialized the s3 variable like so

const s3 = new aws.S3({
  apiVersion: API_VERSION,
  params: { Bucket: BUCKET_NAME }
});
Stephen
  • 7,994
  • 9
  • 44
  • 73
0

I made it as simple as possible. You can iterate uploading objects using for loop, it is quite simple, neat and easy to understand.
package required: fs, express-fileupload

server.js :-

router.post('/upload', function(req, res){
    if(req.files){
        var file = req.files.filename;
        test(file);
    res.render('test');
}
} );

test function () :-

function test(file){
  // upload all
  if(file.length){
    for(var i =0; i < file.length; i++){
      fileUP(file[i]); 
    }
  }else{
    fileUP(file);
}

  // call fileUP() to upload 1 at once
  function fileUP(fyl){
    var filename = fyl.name;
    var tempPath = './temp'+filename;
    fyl.mv(tempPath, function(err){
    fs.readFile(tempPath, function(err, data){
    var params = {
      Bucket: 'BUCKET_NAME',
      Body: data,
      Key: Date.now()+filename
    };

    s3.upload(params, function (err, data) {
      if (data) {
        fs.unlink(tempPath, (err) => {
          if (err) {
            console.error(err)
            return
          }
          else{
            console.log("file removed from temp loaction");
          }
        });
        console.log("Uploaded in:", data.Location);
      }
    });
  });
  });
  }

}

kartik tyagi
  • 6,256
  • 2
  • 14
  • 31
0

This should work,

var listAllKeys = async function (token) {
  if(token) params.ContinuationToken = token;

  return new Promise((resolve, reject) => {
    s3.listObjectsV2(params, function (err, data) {
      if (err){
        reject(err)
      }

      resolve(data)
    });
  });
}

var collect_all_files = async function () {
  var allkeys = []
  conti = true
  token = null
  while (conti) {
    data = await listAllKeys(token)
    allkeys = allkeys.concat(data.Contents);
    token = data.NextContinuationToken
    conti = data.IsTruncated
  }
  return allkeys
};
  • Your answer could be improved by adding more information on what the code does and how it helps the OP. – Tyler2P Jul 06 '22 at 19:35
-1

Using the new API s3.listObjectsV2 the recursive solution will be:

S3Dataset.prototype.listFiles = function(params,callback) {
    var self=this;

    var options = {
    };
    for (var attrname in params) { options[attrname] = params[attrname]; }

    var results=[];
    var s3=self.s3Store.GetInstance();
    function listAllKeys(token, callback) {
        var opt={ Bucket: self._options.s3.Bucket, Prefix: self._options.s3.Key, MaxKeys: 1000 };
        if(token) opt.ContinuationToken = token;
        s3.listObjectsV2(opt, (error, data) => {
            if (error) {
                if(self.logger) this.logger.error("listFiles error:", error);
                return callback(error);
            } else {
                for (var index in data.Contents) {
                    var bucket = data.Contents[index];
                    if(self.logger) self.logger.debug("listFiles Key: %s LastModified: %s Size: %s", bucket.Key, bucket.LastModified, bucket.Size);
                    if(bucket.Size>0) {
                        var Bucket=self._options.s3.Bucket;
                        var Key=bucket.Key;
                        var components=bucket.Key.split('/');
                        var name=components[components.length-1];
                        results.push({
                            name: name,
                            path: bucket.Key,
                            mtime: bucket.LastModified,
                            size: bucket.Size,
                            sizehr: formatSizeUnits(bucket.Size)
                        });
                    }
                }
                if( data.IsTruncated ) { // truncated page
                    return listAllKeys(data.NextContinuationToken, callback);
                } else {
                    return callback(null,results);
                }
            }
        });
    }
    return listAllKeys.apply(this,['',callback]);
};

where

function formatSizeUnits(bytes){
    if      (bytes>=1099511627776) {bytes=(bytes/1099511627776).toFixed(4)+' PB';}
    else if (bytes>=1073741824)    {bytes=(bytes/1073741824).toFixed(4)+' GB';}
    else if (bytes>=1048576)       {bytes=(bytes/1048576).toFixed(4)+' MB';}
    else if (bytes>=1024)          {bytes=(bytes/1024).toFixed(4)+' KB';}
    else if (bytes>1)              {bytes=bytes+' bytes';}
    else if (bytes==1)             {bytes=bytes+' byte';}
    else                           {bytes='0 byte';}
    return bytes;
}//formatSizeUnits
loretoparisi
  • 15,724
  • 11
  • 102
  • 146
-2

Although @Meekohi's answer does technically work, I've had enough heartache with the S3 portion of the AWS SDK for NodeJS. After all the previous struggling with modules such as aws-sdk, s3, knox, I decided to install s3cmd via the OS package manager and shell-out to it using child_process

Something like:

    var s3cmd = new cmd_exec('s3cmd', ['ls', filepath, 's3://'+inputBucket],
            function (me, data) {me.stdout += data.toString();},
            function (me) {me.exit = 1;}
    );
    response.send(s3cmd.stdout);

(Using the cmd_exec implementation from this question)

This approach just works really well - including for other problematic things like file upload.

Community
  • 1
  • 1
CrazyPyro
  • 3,257
  • 3
  • 30
  • 39
-4

The cleanest way to do it for me was through execution of s3cmd from my node script like this (The example here is to delete files recursively):

var exec = require('child_process').exec;
var child;
var bucket = "myBucket";
var prefix = "myPrefix"; // this parameter is optional
var command = "s3cmd del -r s3://" + bucket + "/" + prefix;
child = exec(command, {maxBuffer: 5000 * 1024}, function (error, stdout, stderr) { // the maxBuffer is here to avoid the maxBuffer node process error
            console.log('stdout: ' + stdout);
            if (error !== null) {
                console.log('exec error: ' + error);
            }
        });
Amaynut
  • 4,091
  • 6
  • 39
  • 44
  • Why would you do it via command line when amazon provides a whole package just for that? – dcohenb Mar 15 '16 at 08:32
  • to be fair, there was a time when s3cmd was about the only way to interact with s3 other than direct to the APIs - and the SDKs we value so highly were only something we wish we had. That said, it's not 2007 anymore..... you really shouldn't be wrapping command line tools directly like that if you can at all avoid it. – keen Nov 03 '16 at 20:08