0

We have ~400,000 files on a private S3 bucket that are inbound/outbound call recordings. The files have a certain pattern to it that lets me search for numbers both inbound and outbound. Note these calls are on the Glacier storage class

Using AWS CLI, I can search through this bucket and grep the files I need out. What I'd like to do is now initiate an S3 restore job to expedited retrieval (so ~1-5 minute recovery time), and then maybe 30 minutes later run a command to download the files.

My efforts so far:

aws s3 ls s3://exetel-logs/ --recursive | grep .*042222222.* | cut -c 32-

Retreives the key of about 200 files. I am unsure of how to proceed next, as aws s3 cp wont work for any objects in storage class.

Cheers,

Jamie S
  • 760
  • 3
  • 9
  • 19

2 Answers2

2

The AWS CLI has two separate commands for S3: s3 ands3api. s3 is a high level abstraction with limited features, so for restoring files, you'll have to use one of the commands available with s3api:

aws s3api restore-object --bucket exetel-logs --key your-key

If you afterwards want to copy the files, but want to ensure to only copy files which were restored from Glacier, you can use the following code snippet:

for key in $(aws s3api list-objects-v2 --bucket exetel-logs --query "Contents[?StorageClass=='GLACIER'].[Key]" --output text); do
  if [ $(aws s3api head-object --bucket exetel-logs --key ${key} --query "contains(Restore, 'ongoing-request=\"false\"')") == true ]; then
    echo ${key}
  fi
done
Dunedan
  • 7,848
  • 6
  • 42
  • 52
  • Great answer, except that objects restored from glacier storage class shouldn't subsequently show as `STANDARD`. That doesn't seem possible, for a couple of different reasons. I don't think there's a way to differentiate restoration status from an object listing. A `head-object` should be needed. – Michael - sqlbot Aug 14 '17 at 13:19
  • Thanks for pointing out. I fixed the code snippet in the answer. – Dunedan Aug 17 '17 at 04:53
0

Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.

rvd
  • 558
  • 2
  • 9