15

I have about 50Gb worth of files that was stored in S3. Yesterday I stupidly added a lifecycle rule to transfer files that were more than 30 days old from S3 to Glacier not realising that this will disable the public link to the original file.

I actually really need these files to stay in S3 as they are images and drawings that are linked on our website.

I've have requested a restore of the files from Glacier, however as far as I understand this has limits for the number of days that the files will be available for before they go back to Glacier.

I was thinking that I am going to have to create a new bucket, then copy the files across to it and then link that new bucket up to my website.

My questions:

  1. I was wondering if there is a way to do this without having to copy my files to a new bucket?

  2. If I just change the storage class of the file once it is back in S3 will this stop it going back to Glacier?

  3. If I have to copy the files to a new bucket I'm assuming that these copies won't randomly go back to Glacier?

I'm quite new to S3 (as you can probably tell by my bone-headed mistake) so please try to be gentle

Pete Dermott
  • 663
  • 1
  • 9
  • 20

6 Answers6

14

You don't need a new bucket. You restore the objects from glacier (temporarily) and then overwrite them using the COPY operation, which essentially creates new objects and they'll stay around. Needless to say, you'll need to disable your aging-away-to-glacier lifecycle.

Temporary restore:

aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key <keyName>

Replace with copied object:

aws s3 cp s3://bucketName/keyName s3://bucketName/keyName --force-glacier-transfer --storage-class STANDARD

Docs say:

The transition of objects to the GLACIER storage class is one-way.

You cannot use a lifecycle configuration rule to convert the storage class of an object from GLACIER to STANDARD or REDUCED_REDUNDANCY storage classes. If you want to change the storage class of an archived object to either STANDARD or REDUCED_REDUNDANCY, you must use the restore operation to make a temporary copy first. Then use the copy operation to overwrite the object as a STANDARD, STANDARD_IA, ONEZONE_IA, or REDUCED_REDUNDANCY object.

Ref.

...going back to Glacier

Being pedantic for a moment, the archived objects aren't moving between s3 and glacier, they're permanently in glacier and temporary copies are made in S3 - It's important to note that you're paying for both glacier and s3 when you temporarily restore them. Once your retention period expires, the S3 copies are deleted.

Community
  • 1
  • 1
RaGe
  • 22,696
  • 11
  • 72
  • 104
  • 6
    Thanks, I understand how this works a little better now. In the end I needed to use the command `aws s3 cp s3://mybucket s3://mybucket --force-glacier-transfer --storage-class STANDARD --recursive` in order to make my copies. – Pete Dermott Aug 06 '18 at 07:43
  • the `--force-glacier-transfer` gives me this error `An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's storage class` – CpILL May 13 '21 at 21:50
  • Did you go the restore operation before the copy operation? – RaGe May 14 '21 at 23:03
9

To provide a complete answer I've combined two other SO posts:

Step one temporarily restore everything:

  1. Get a listing of all GLACIER files (keys) in the bucket (you can skip this step if you are sure all files are in Glacier).

    aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk -F '\t' '{print $2}' > glacier-restore.txt

  2. Create a shell script and run it, replacing your "bucketName".

    #!/bin/sh
    
    IFS=$'\n'
    for x in `cat glacier-restore.txt`
      do
        echo "Begin restoring ${x}"
        aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "${x}"
        echo "Done restoring ${x}"
      done
    

Credit Josh & @domenic-d.

Step two for permanent restore:

aws s3 cp s3://mybucket s3://mybucket --force-glacier-transfer --storage-class

done and done.

Credit to @pete-dermott's comment here.

Maros
  • 1,825
  • 4
  • 25
  • 56
David
  • 3,166
  • 2
  • 30
  • 51
  • 1
    I needed to add `-F '\t'` flag to the `awk` command to make it work for filenames with spaces. – Maros Jun 04 '20 at 19:53
2

I used the following command to restore S3 object from the Amazon Glacier storage class :

aws s3api restore-object --bucket bucket_name --key dir1/sample.obj --restore-request '{"Days":25,"GlacierJobParameters":{"Tier":"Standard"}}'

Here a temporary copy of the object is made available for the duration specified in the restore request, such as the 25 days used in the above command.

If the JSON syntax used in the example results in an error on a Windows client, replace the restore request with the following syntax:

--restore-request Days=25,GlacierJobParameters={"Tier"="Standard"}

Note: This will only create a temporary copy of the object for the specified duration.You have to make use of the copy operation to overwrite the object as a Standard object.

To change the object's storage class to Amazon S3 Standard use the following command:

aws s3 cp s3://bucket_name/dir1 s3://bucket_name/dir1 --storage-class STANDARD --recursive --force-glacier-transfer

This will recursively copy and overwrite existing objects with the Amazon S3 Standard storage class.

snehab
  • 684
  • 5
  • 13
1

In case someone wants to retrieve all object within a Bucket - here are some PowerShell Core commands to do so.

If you need to install PowerShell core first - do so. Then, install the AWS Tools for PowerShell on Windows or on Linux or macOS. Then install the AWS.Tools.S3 module via Install-AWSToolsModule AWS.Tools.S3

Run retrieve operation for each object within the Bucket:

Get-S3Object -BucketName arq-backup-s3 | ForEach-Object -Parallel {
    aws s3api restore-object --bucket $_.BucketName --key $_.Key --restore-request 'Days=14,GlacierJobParameters={Tier=Standard}'
}

Get a current state of how many objects already have been retrieved. The code can run quite a long time depending on how many objects you want to retrieve.

Get-S3Object -BucketName arq-backup-s3 | ForEach-Object -Parallel {
    $obj = aws s3api head-object --bucket $_.BucketName --key $_.Key | ConvertFrom-Json
    $restoredCount = ($obj | Where-Object -Property Restore -eq 'ongoing-request="false"' | Measure-Object).Count
    $workItems = ($obj | Where-Object -Property Restore -eq 'ongoing-request="true"' | Measure-Object).Count
    
    return [pscustomobject]@{
        Done = $restoredCount
        Missing = $workItems
    }
} | Measure-Object -Property Done, Missing -Sum

It can take between 3 and 5 hours to retrieve an object.

At least you have to kind of overwrite each object to permanently put it back into your preferred storage class.

Get-S3Object -BucketName arq-backup-s3 | ForEach-Object -Parallel {
    aws s3 cp s3://$($_.BucketName)/$($_.Key) s3://$($_.BucketName)/$($_.Key) --force-glacier-transfer --storage-class STANDARD
}

I run the above code to permanently retrieve 6337 Objects from Glacier, 65 GB total in Size.

Aziza Kasenova
  • 1,501
  • 2
  • 10
  • 22
clowa
  • 21
  • 6
0

To restore everything it is now (2022) possible to make use of S3 Batch Job Operations Creating an S3 Batch Operations job:

  1. create list of of files with a similar list command:

aws s3api list-objects-v2 --bucket mybucket --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print "mybucket,", $2}' > mybucket.csv

Pay attention to spaces in the csv file: the awk command above inserts a space after ',' before the file name and it is interpreted as a file name starting with space. It can be removed with sed -i 's/, /,/' mybucket.csv'

  1. upload mybucket.csv to S3 somewhere and use it as manifest for a restore job
pmosconi
  • 565
  • 5
  • 9
0

I've requested a restore of the files from Glacier, however as far as I understand this has limits for the number of days that the files will be available before they go back to Glacier.

There are two limits of days. When you request file retrieval you have in fact two files: one remails in Glacier, the second is in Standard class, however, in the web console we see two files as one. So, the limit you're speaking about says how long the "standard" copy exists.

The second limit in days (and file size for Glacier Instant Retrieval) is the minimum time (and size) you'll be billed for. You'll pay for that anyway even if you delete the file before that day (or if the file size is less than 128 KB), but you definitely can change the storage class or delete it.

Now (2022/06) it is possible to change the storage class back to standard much simpler. To overwrite the existing object with the Amazon S3 Standard storage class, run the following command:

aws s3 cp s3://awsexamplebucket/dir1/example.obj s3://awsexamplebucket/dir1/example.obj --storage-class STANDARD

To perform a recursive copy for an entire prefix and overwrite existing objects with the Amazon S3 Standard storage class, run the following command:

aws s3 cp s3://awsexamplebucket/dir1/ s3://awsexamplebucket/dir1/ --storage-class STANDARD --recursive --force-glacier-transfer

Doc: open https://aws.amazon.com/premiumsupport/knowledge-center/restore-s3-object-glacier-storage-class/ then look for "Change the object's storage class to Amazon S3 Standard"

If I just change the storage class of the file once it is back in S3 will this stop it from going back to Glacier?

Please remember to delete/modify the lifecycle rule or the files will be moved to Glacier again.

If I have to copy the files to a new bucket I'm assuming that these copies won't randomly go back to Glacier?

Nothing there is random :) No, they won't unless you have such a lifecycle rule. But if you copy them to another dir in the same bucket they can be affected by the existing rule.

Putnik
  • 5,925
  • 7
  • 38
  • 58