download file using s3fs

Question

I am trying to download a csv file from an s3 bucket using the s3fs library. I have noticed that writing a new csv using pandas has altered data in some way. So I want to download the file directly in its raw state.

The documentation has a download function but I do not understand how to use it:

download(self, rpath, lpath[, recursive]): Alias of FilesystemSpec.get.

Here's what I tried:

import pandas as pd
import datetime
import os
import s3fs
import numpy as np

#Creds for s3
fs = s3fs.S3FileSystem(key=mykey, secret=mysecretkey)
bucket = "s3://mys3bucket/mys3bucket"
files = fs.ls(bucket)[-3:]


#download files:
for file in files:
    with fs.open(file) as f:
        fs.download(f,"test.csv")

AttributeError: 'S3File' object has no attribute 'rstrip'

Jacky · Accepted Answer · 2020-07-22T18:56:51.510

11

for file in files:
    fs.download(file,'test.csv')

Modified to download all files in the directory:

import pandas as pd
import datetime
import os
import s3fs
import numpy as np

#Creds for s3
fs = s3fs.S3FileSystem(key=mykey, secret=mysecretkey)
bucket = "s3://mys3bucket/mys3bucket"

#files references the entire bucket.
files = fs.ls(bucket)

for file in files:
    fs.download(file,'test.csv')

edited Jul 22 '20 at 18:56

answered Jul 21 '20 at 15:58

Jacky

710
2
8
27

Any idea how to modify this to download all files in a directory? – Zach Rieck Jul 21 '20 at 16:53
1

@ZachRieck I edited my answer to download all files. – Jacky Jul 22 '20 at 18:57
1

Nice! This really needs more visibility. Only answers I could find anywhere on how to use s3fs – Zach Rieck Jul 23 '20 at 19:47
In my version (0.2.2) `fs.download` doesn't seem to exist but the equivalent is `fs.get`. – Jsl Oct 23 '20 at 09:39
This does download all files and overwrite `test.csv` every time though, right? – Ted Brownlow Mar 29 '23 at 13:34

Zach Rieck · Answer 2 · 2020-07-21T20:59:41.550

I'm going to copy my answer here as well since I used this in a more general case:

# Access Pando
import s3fs
#Blocked out url as "enter url here" for security reasons
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url':"enter url here"})

# List objects in a path and import to array
# -3 limits output for testing purposes to prevent memory overload
files = fs.ls('hrrr/sfc/20190101')[-3:]

#Make a staging directory that can hold data as a medium
os.mkdir("Staging")

#Copy files into that directory (specific directory structure requires splitting strings)
for file in files:
    item = str(file)
    lst = item.split("/")
    name = lst[3]
    path = "Staging\\" + name
    print(path)
    fs.download(file, path)

Note that the documentation is fairly barren for this particular python package. I was able to find some documentation regarding what arguments s3fs takes here (https://readthedocs.org/projects/s3fs/downloads/pdf/latest/). The full arguments list is toward the end, though they don't specify what the parameters mean. Here's a general guide for s3fs.download:

-arg1 (rpath) is the source directory for where you are getting the files from. As in both above answers, the best way to obtain this is to do an fs.ls on your s3 bucket and save that to a variable

-arg2 (lpath) is the destination directory and file name. Note that without a valid output file, this will return the Attribute Error OP got. I have this defined as a path variable

-arg3 is an optional parameter to choose to perform the download recursively

download file using s3fs

2 Answers2

Linked