3

Is it possible to query and get the files in an s3 bucket in the latest creation order with pagination?

I can even store the creation timestamp as suffix in my file name.

Any help would be appreciated.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
kane.zorfy
  • 1,000
  • 4
  • 14
  • 27

2 Answers2

0

On 29th November 2017 at AWS Re:invent AWS Announced a Service called S3 Select which is available in Preview.

S3 Select is a new Amazon S3 capability designed to pull out only the data you need from an object, dramatically improving the performance and reducing the cost of applications that need to access data in

Also,

During the Preview, you can use Amazon S3 Select through the available Presto connector, with AWS Lambda, or from any other application using the S3 Select SDK for Java or Python. This Preview is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore) AWS Regions

You can Apply for Preview here

Thanks

Kush Vyas
  • 5,813
  • 2
  • 26
  • 36
  • 4
    https://www.youtube.com/watch?v=2_8ZK_64hBc, I could see that S3 select is for pulling portion of you s3 object. My question is how to get the filtering on the object selection itself. Say i have 10 csv files in my s3 bucket. I want to get the file names on the basis of creation date order starting from the latest. – kane.zorfy Jan 09 '18 at 18:20
0

Maybe this link will help. It uses the boto3 python SDK.

Basically, write a function which will:

  1. use the list_all_objects API call
  2. loop through the objects
  3. Store the Key(name) and 'LastModified' attribute of an object in a dictionary.
  4. Sort the dictionary, and return it.

    def sort_objects_in_bucket_by_timestamp( bucket_name ):
      sorted_objects_dict = {}
      for curr_obj_attr_dict in  s3_client.list_objects_v2( Bucket = bucket_name )['Contents']:
        sorted_objects_dict[curr_obj_attr_dict['Key']] = curr_obj_attr_dict['LastModified']
    return sorted(sorted_objects_dict.items(), key=lambda x: x[1])
    

The function will return a list of objects in a specific bucket, ordered from oldest to newest, by the 'LastModified' timestamp.

If you don't want the LastModified timestamp, then change the function to read the timestamp part from the name of the bucket(which you said was viable), and make the value the timestamp.

Varun Vembar
  • 318
  • 5
  • 18