7

I want to fetch objects after a particular date. Using the AWS CLI I can list objects using below command:

aws s3api list-objects-v2 --bucket "bucket1" --prefix "file-" --query "(Contents[?LastModified>'2019-02-06T05:34:12.000Z'])[0]"

But I want to do it from my code so please let me know how can I filter objects usin NPM AWS-SDK.

Note: I can do it using exec or spawn but for this I have to configure profile using CLI which will create credential file on local so I dont want to do this.

Sachin
  • 2,912
  • 16
  • 25

1 Answers1

3

Use the AWS SDK for node.js. Call the listObjectsV2 method and then use jmespath.js in the callback method to filter the output of the API call. This is the equivalent of what the AWS CLI does via the --query param.

Something like this (untested)

var params = {
  Bucket: "bucket1", 
  Prefix: "file-"
 };
 s3.listObjectsV2(params, function(err, data) {
   if (err) console.log(err, err.stack); // an error occurred
   else {
       query = "Contents[?LastModified>'2019-02-06T05:34:12.000Z']"
       var results = jmespath.search(data,query);
   }
 };
bwest
  • 9,182
  • 3
  • 28
  • 58
  • 1
    I have lacs of file in my S3 in this case I will have to bring all the objects and then filter out? while using query parameter in cli I can bring only the filtered records. – Sachin Feb 06 '19 at 17:36
  • Yes, but this is the same way the CLI works. It retrieves the result set from the API and then filters the output. It all happens on the client side using JMESPath. – bwest Feb 06 '19 at 17:40
  • Okay. I tried to find out how to use jmespath with listObjectsV2 but didnt get it. Can you please tell me an example to use it in callback and get the filtered result? – Sachin Feb 06 '19 at 17:46
  • added a quick example – bwest Feb 06 '19 at 18:20
  • 2
    I think it is searching in only first 1000 records and not getting me data after 06th Feb but when I am using the cli command I am able to get the result. I have thousands of record before 06th feb. – Sachin Feb 06 '19 at 18:46
  • Yes that's correct, as the docs say "Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket." You will have to handle paging. https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html – bwest Feb 06 '19 at 18:48
  • Yes. That's right. I can't or don't want to travel lacs of objects to fetch the result(suppose file names are not on the basis of dates). I want to fetch the result in a single hit same like we can do using above mentioned CLI command. – Sachin Feb 06 '19 at 19:58
  • The CLI results are paginated too, the client just handles the paging for you in the background. https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html Here's some code to help you handle the paging https://stackoverflow.com/a/18324270/401096 – bwest Feb 06 '19 at 21:36
  • certainly, loading lakhs of records into the server memory and then filtering it using library doesn't make sense. It can hog down the server just for one request. I'm also looking for solution here. @Sachin what solution you went forward with? Are you calling the exec/spawn ultimately or you got a way to filter using AWS-SDK? – Shishir Sonekar Jun 13 '22 at 09:41
  • What are your restrictions that prevent you from filtering in memory? Saying it doesn't make sense generally... well, it's the approach the official AWS tooling uses, so more likely your case is the exception. What makes it so? – bwest Jun 13 '22 at 16:07
  • I have millions of files in the bucket to be filtered from. I mean how loading those millions of files listing into the server and then filtering in the server be an efficient solution! – Shishir Sonekar Jun 14 '22 at 06:44
  • Have you done any benchmarking to determine the actual impact? Is there a resource limitation you are dealing with? It sounds like you are assuming there will be a problem instead of reporting one – bwest Jun 14 '22 at 15:00