1

I want to copy the Latest CSV file which has the date appended from an AWS S3 bucket to a local drive.

I have the basic code that will download the file but it downloads all the files in the bucket I only want the file uploaded that day, latest file.

Stockburn
  • 115
  • 1
  • 4
  • Thanks Adil B I might be able to use this one, I was hoping to use the file name as the date gets appended to the end of each file. – Stockburn Jan 31 '19 at 00:26
  • What do you mean by "Latest CSV file which has the date appended"? Can you please provide more information and some samples? Will the file be selected based on the filename (in which case, how should it choose?), or simply by grabbing the file most recently created in the bucket? – John Rotenstein Jan 31 '19 at 01:32
  • John, the files have the date e.g. 31012019 appended to the end of the name Like some_file_31012019.csv In the bucket there are files with 31012019, 30012019, 29012019 as the appended date. I would only want the latest. Hope that clarifies. – Stockburn Jan 31 '19 at 01:46
  • How do you define "the latest"? Is it "the file with the latest date" (which requires interpreting all filenames), or "the latest file that was created" (which can be done by sorting by date)? Also, FYI, it is better to use a filename format of `YYYYMMDD` (eg `20190131`), since this sorts better and is an international standard. It avoids problems of interpreting `01052019`, which could be interpreted as 1-May or 5-Jan. – John Rotenstein Jan 31 '19 at 01:48
  • The latest date at the end of the file. So today I would want some_file_20190131.csv and ignore all other files. Tomorrow when the new file some_file_20190201.csv gets created I would want it to copy that file – Stockburn Jan 31 '19 at 01:52

1 Answers1

3

Download latest object by modified date

If you only wish to grab the file that was last stored on Amazon S3, you could use:

aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --query 'sort_by(Contents, &LastModified)[-1].Key' --output text` .

This command does the following:

  • The inner aws s3api list-objects-v2 command lists the bucket, sorts by date (reversed), then returns the Key (filename) of the object that was last modified
  • The outer aws s3 cp command downloads that object to the local directory

Download latest object based on filename

If your filenames are like:

some_file_20190130.csv
some_file_20190131.csv
some_file_20190201.csv

then you can list by prefix and copy the last one:

aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --prefix some_file_ --query 'sort_by(Contents, &Key)[-1].Key' --output text` .

This command does the following:

  • The inner aws s3api list-objects-v2 command lists the bucket, only shows files with a given prefix of some_file_, sorts by Key (reversed), then returns the Key (filename) of the object that is at the end of the sort
  • The outer aws s3 cp command downloads that object to the local directory
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Please note that I wrote these commands on a Mac. When you use them in PowerShell, you might need to convert the quotes. The inner commands should work under the Ubuntu shell, otherwise you might need to convert to PowerShell-compatible syntax. – John Rotenstein Jan 31 '19 at 02:09
  • Thanks John, really appriciate the effort you have put it. Will have a crack and once it works confirm. – Stockburn Jan 31 '19 at 02:15