25

Question:

Is there a simple way to access a data file stored on Amazon S3 directly from the command line?

Motivation:

I'm loosely following an online tutorial where the author links to the following URL:

s3://bml-data/churn-bigml-80.csv

It is a simple csv file, but I can't open it using my web browser, or with curl. The tutorial opens it with BigML, but I want to download the data for myself. Some googling tells me that there are a number of python and Scala libraries designed for S3 access ... but it would be really nice to open or download the file more directly.

I use Mac and am a big fan of homebrew, so the perfect solution (for me) would work on this system.

Bonus Question:

Is there any good way to see the contents of an Amazon E3 bucket (that I don't own)?

The nature of the file (80% of a particular data-set) makes me suspect that there may be a churn-bigml-20.csv file hiding somewhere out there. My automatic approach would be to try and curl / open the expected file ... the solution to the first question will allow me to check this hunch but in an ugly way. If anyone knows of a way to remotely explore the contents of a specific S3 bucket, then that would be very useful. Again, exploring google and SO tells me that there are libraries for this, but a more direct approach would be useful.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
GnomeDePlume
  • 1,642
  • 2
  • 15
  • 31

3 Answers3

31

The AWS Command Line Interface (CLI) is a unified tool to manage AWS services, including accessing data stored in Amazon S3.

The AWS Command Line Interface is available for Windows, Mac and Linux.

If the bucket owner has granted public permissions for ListBucket, then you can list the contents of the bucket, eg:

aws s3 ls s3://bml-data

If the bucket owner has granted public permissions for GetObject, then you can copy an object:

aws s3 cp s3://bml-data/churn-bigml-80.csv churn-bigml-80.csv

Both of these commands works successfully for me.

See also:

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • This is great! And aws is also installable using brew: `brew install awscli` – GnomeDePlume Nov 26 '14 at 00:07
  • any reason why this is happening? curl https://ryft-public-sample-data.s3.us-east-1.amazonaws.com/DNS/dns1.log -s -o dns1.log works but *not* aws s3 cp s3://ryft-public-sample-data/DNS/dns1.log . fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden – shrek Oct 12 '22 at 15:46
  • @shrek The `aws s3 cp s3://ryft-public-sample-data/DNS/dns1.log .` command works fine on my computer. It downloaded the file. – John Rotenstein Oct 12 '22 at 21:08
4

There's a neat tool called s3cmd that will do this.

  • It works on Mac (with the homebrew package manager)
  • It lets you download from Amazon S3 to your local machine
  • It lets you browse Amazon S3 buckets (even when you don't own them)

Installation and Setup

brew install s3cmd

Configuring the s3cmd requires that you have an amazon s3 account. This is free, but you need to sign up for it here.

s3cmd --configure

Configuration involves specifying your access / secret key pair, and a few other details (I used defaults for everything). If you want to use HTTPS then you can install gpg with brew, and set a few more configuration options at this point. Be warned - the gpg_passphrase that you use is stored in a local plain-text configuration file!

Use:

Now for the exciting bit: downloading my file to desktop!

s3cmd get s3://bml-data/churn-bigml-80.csv ~/Desktop

Listing the contents of the remote bucket:

s3cmd ls s3://bml-data/

Additional Functionality:

This is beyond the scope of the question but seems worth mentioning: s3cmd can do other things like put data into the bucket (and make it public with the -P flag), delete files, and show the manual for more information:

s3cmd -P put ~/Desktop/my-file.png  s3://mybucket/
s3cmd del s3://mybucket/my-file-to-delete.png
man s3cmd

Credit:

Thanks to Neil Gee for his tutorial on s3cmd.

GnomeDePlume
  • 1,642
  • 2
  • 15
  • 31
0

If you just want to download the file in linux terminal, you have to make the file as public.

FYI: Everyone will have access to one or all of the following: read this object, read and write permissions.

Once this has been done. Right click on the file >> Download as >> then you may able to see a popup.

Right-click the download link and choose " Copy link location. >> then paste it on a notepad. Then select the link before question mark for example.

https://s3-ap-nrtheast-1.amazonws.com/backup/pan.hosting/2017-01-15/earth.tar.gz?response-content-disposition=attachment&X-Amz-Security-Token=%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwTEigAJ4GvimzYt3gQegUHaRSe%2BnLWeND%

Then enter the command below on your command terminal.

wget https://s3-ap-nrtheast-1.amazonws.com/backup/pan.hosting/2017-01-15/earth.tar.gz

Aswin Mohanan
  • 151
  • 1
  • 2
  • 4