4

Is there a command line way to restore data from Glacier? So far I have tried:

s3cmd restore --recursive s3://mybucketname/folder/

aws s3 ls s3://<bucket_name> | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request Days=<days> --bucket <bucket_name> --key

But, no help there. PS: I know we can do this via the console.

jarmod
  • 71,565
  • 16
  • 115
  • 122
Shabbir Bata
  • 861
  • 2
  • 11
  • 23
  • When you say "no help here", what do you mean? Did the command fail? Did it succeed, but you can't find the S3 object? You should be aware from the documentation that standard Glacier retrieval time is typically 3-5 hours, unless you request it be expedited (in which case you pay more). – jarmod Oct 10 '17 at 16:11

4 Answers4

1

You can use the lower level aws s3api command:

aws s3api restore-object --request-payer requester \
                         --key path/to/key.blob \
                         --bucket my-bucket \
                         --cli-input-json "$(cat request.json)"

And then set your parameters inside request.json, for instance:

{
    "RestoreRequest": {
     "Days": 1,
     "GlacierJobParameters": {
         "Tier": "Standard"
     }
    }
}

Once the restore request is initiated, you will have to call head-object to determine its restore status:

aws s3api head-object --key path/to/key.blob \
                      --bucket my-bucket \
                      --request-payer requester
{
    "AcceptRanges": "bytes",
    "Restore": "ongoing-request=\"true\"",
    "LastModified": "Thu, 30 May 2019 22:43:48 GMT",
    "ContentLength": 1573320976,
    "ETag": "\"5e9bae0592655103e72d0c026e643184-94\"",
    "ContentType": "application/x-gzip",
    "Metadata": {
        "digest-md5": "7ace7afadfaec591a7dcff2b942df701",
        "import-digests": "md5"
    },
    "StorageClass": "GLACIER",
    "RequestCharged": "requester"
}

When Restore contains, ongoing-request="false", the restoration will be complete. The temporary copy in S3 will last for the duration you specified in the restore command. The StorageClass is always GLACIER (or DEEP_ARCHIVE) for any restored file, even after its restoration is complete. This is unintuitive.

If you wish to restore that copy permanently into S3, i.e. and change the storage class from GLACIER to STANDARD, you will need to do put/copy (potentially over itself) the restored copy into a new file. It's annoying.

Related:

Note: The --request-payer requester is optional. I use that in my setup, but if you're the owner of the bucket, you don't need it.

init_js
  • 4,143
  • 2
  • 23
  • 53
0

Unfortunately, it is not possible. You can access objects that have been archived to Amazon Glacier only by using Amazon S3.

Refer: http://docs.aws.amazon.com/AmazonS3/latest/user-guide/restore-archived-objects.html

Anush Arvind
  • 386
  • 3
  • 6
  • Yes, I am aware of this particular way. However, even via the console the limitation is, that you cant restore folder. Restoration is at object level and not at bucket level. Which I believe is not a good way to do it. – Shabbir Bata Oct 10 '17 at 16:28
  • S3 does not have folders. A folder is a logical concept and not something that actually exists in S3. All objects in S3 are just keys in a flat name space. The fact that folders appear is just a logical concept using the file name character "/" as a separator in the client software. – John Hanley Oct 10 '17 at 19:17
  • This answer is no longer accurate as of 2022. `s3cmd` supports recursive S3 Glacier restores using syntax such as `s3cmd restore --recursive s3://mybucketname/folder/`. – Piotr Andruszkow Apr 22 '22 at 10:09
0

Here is an example using Java to restore an object. From this link you can then do something similar in your language of choice.

Restore an Archived Object Using the AWS SDK for Java

John Hanley
  • 74,467
  • 6
  • 95
  • 159
0

You can use the python code below to restore glacier data to s3.

import boto3, botocore
import subprocess, os, shutil, tempfile, argparse, sys, time, codecs
from pprint import pprint

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

parser = argparse.ArgumentParser()
parser.add_argument('--max-rate-mb', action='store', type=int, default=10000, help='The maximum rate in MB/h to restore files at.  Files larger than this will not be restored.')
parser.add_argument('--restore-days', action='store', type=int, default=30, help='How many days restored objects will remain in S3.')
parser.add_argument('--restore-path', action='store', help='The bucket/prefix to restore from')
parser.add_argument('--pretend', action='store_true', help='Do not execute restores')
parser.add_argument('--estimate', action='store_true', help='When pretending, do not check for already-restored files')
args = parser.parse_args()

if not args.restore_path:
    print 'No restore path specified.'
    sys.exit(1)

BUCKET = None
PREFIX = None
if '/' in args.restore_path:
    BUCKET, PREFIX = args.restore_path.split('/',1)
else:
    BUCKET = args.restore_path
    PREFIX = ''

RATE_LIMIT_BYTES = args.max_rate_mb * 1024 * 1024

s3 = boto3.Session(aws_access_key_id='<ACCESS_KEY>', aws_secret_access_key='<SECRET_KEY>').resource('s3')
bucket = s3.Bucket(BUCKET)

totalsize = 0
objects = []

objcount = 0
for objpage in bucket.objects.filter(Prefix=PREFIX).page_size(100).pages():
    for obj in objpage:
        objcount += 1
        print obj
        objects.append(obj)
    print u'Found {} objects.'.format(objcount)
print

objects.sort(key=lambda x: x.size, reverse=True)

objects = filter(lambda x: x.storage_class == 'GLACIER', objects)

if objects:
    obj = objects[0]
    print u'The largest object found is of {} size: {:14,d}  {:1s}  {}'.format(('a restorable' if obj.size <= RATE_LIMIT_BYTES else 'an UNRESTORABLE'), obj.size, obj.storage_class[0], obj.key)
    print

while objects:
    current_set = []
    current_set_total = 0
    unreported_unrestoreable_objects = []
    i = 0
    while i < len(objects):
        obj = objects[i]

        if obj.size > RATE_LIMIT_BYTES:
            unreported_unrestoreable_objects.append(obj)
        elif unreported_unrestoreable_objects:
            # No longer accumulating these.  Print the ones we found.
            print u'Some objects could not be restored due to exceeding the hourly rate limit:'
            for obj in unreported_unrestoreable_objects:
                print u'- {:14,d}  {:1s}  {}'.format(obj.size, obj.storage_class[0], obj.key)
            print

        if current_set_total + obj.size <= RATE_LIMIT_BYTES:
            if not args.pretend or not args.estimate:
                if obj.Object().restore is not None:
                    objects.pop(i)
                    continue
            current_set.append(obj)
            current_set_total += obj.size
            objects.pop(i)
            continue
        i += 1

    for obj in current_set:
        print u'{:14,d}  {:1s}  {}'.format(obj.size, obj.storage_class[0], obj.key)
        #pprint(obj.Object().restore)
        if not args.pretend:
            obj.restore_object(RestoreRequest={'Days': args.restore_days})
        #sys.exit(0)

    print u'{:s} Requested restore of {:d} objects consisting of {:,d} bytes.  {:d} objects remaining.  {:,d} bytes of hourly restore rate wasted'.format(time.strftime('%Y-%m-%d %H:%M:%S'), len(current_set), current_set_total, len(objects), RATE_LIMIT_BYTES - current_set_total)
    print
    if not objects:
        break
    if not args.pretend:
        time.sleep(3690)

Command to run the script:

python restore_glacier_data_to_s3.py --restore-path s3-bucket-name/folder-name/
Sumit Saurabh
  • 1,366
  • 1
  • 19
  • 33