11

I am trying to upload some .tar.gz files to AWS S3 Glacier using the upload-archive command (docs).

I ran the upload-archive command over one day ago on a 29 GB .tar.gz file like so:

aws glacier upload-archive --vault-name my-vault --account-id - --archive-description "my description" --body my-file.tar.gz

I checked today, and found it still has yet to finish execution.


My question is two parts:

  • Is there some way to see that the command is still running?
    • A progress bar would be awesome
  • Are there any ways (or alternate methods) to speed up the upload-archive?

FYI, I am using aws-cli==2.0.17. Thank you in advance for any help!


**Edit**

After running for two days, the upload-archive command errored out with the below message:

An error occurred (InvalidParameterValueException) when calling the UploadArchive operation: Invalid Content-Length: 30957118

Which lead to aws-cli #3413. The posts there are in agreement with all the answers below.

Intrastellar Explorer
  • 3,005
  • 9
  • 52
  • 119

3 Answers3

4

In case you do not require to use Amazon S3 Glacier special features such as vault locks and vault policies, you may consider using Amazon S3 with storage class of glacier.

The class stores all your objects in the Amazon S3 Glacier backed, but provides easy and familiar interface of S3. Some benefits as compared to using Amazon S3 Glacier directly:

  • file names are preserved in S3. In Glacier your filenames get scrambled,
  • easy multi-part upload using aws s3 cli,
  • easy retrieval to the archived objects
  • s3 object lifecycles which can automatically transition your objects to S3 glacier storage, or from glacier to deep archive.
Marcin
  • 215,873
  • 14
  • 235
  • 294
  • 1
    Thank you @Marcin! Whereas that doesn't directly answer my question, it provides a much better solution. I will be using this, thank you!! – Intrastellar Explorer Jun 01 '20 at 18:34
  • Actually, I discovered `rclone` makes this easy, too: https://rclone.org/s3/, I didn't have to really write any code to do a multipart upload – Intrastellar Explorer Jun 02 '20 at 01:29
  • Thanks for letting me know. So you are using S3 now with storage class of glacier, or just directly Glacier service? – Marcin Jun 02 '20 at 01:36
  • I thought the points you made were good, especially the file name preservation. I am using S3 with a storage class of S3 Glacier Deep Archive. I also discovered `rclone` has an arg `--progress` that displays progress/stats during upload. – Intrastellar Explorer Jun 02 '20 at 02:34
  • @IntrastellarExplorer Glad to hear it worked out :-) – Marcin Jun 02 '20 at 02:41
  • @Marcin Thank you, I didn't know this was an option. But can you expand on how does S3 Glacier differs from S3 with storage class Glacier? What are vaults good for? Is the pricing of the two options exactly the same? It almost looks as if S3 Glacier archives are just the same as S3 Glacier bucket objects, but you have to do everything programatically, which is very inconvenient. – Davi Doro Aug 19 '21 at 19:31
2

Take a look at multipart-upload to Glacier.

This example initiates a multipart upload to a vault named my-vault with a part size of 1 MiB:

aws glacier initiate-multipart-upload --account-id - --part-size 1048576 --vault-name my-vault

As for checking if an existing upload is progressing, you can always look at the network activity on the uploading client and see if there is any bandwidth towards AWS IP addresses.

Docs

Adi Dembak
  • 2,433
  • 2
  • 18
  • 26
1

Like Adi Dembak suggested I would instead use a multipart upload. By taking this approach you will be able to use the ProgessListener API to track its progress. See the following link for more details. https://docs.amazonaws.cn/en_us/AmazonS3/latest/dev/HLTrackProgressMPUJava.html

Tony
  • 78
  • 5