How to read parquet file from s3 using dask with specific AWS profile

Question

How to read a parquet file on s3 using dask and specific AWS profile (stored in a credentials file). Dask uses s3fs which uses boto. This is what I have tried:

>>>import os
>>>import s3fs
>>>import boto3
>>>import dask.dataframe as dd

>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "~/.aws/credentials"

>>>fs = s3fs.S3FileSystem(anon=False,profile_name="some_user_profile")
>>>fs.exists("s3://some.bucket/data/parquet/somefile")
True
>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile')
NoCredentialsError: Unable to locate credentials

muon · Accepted Answer · 2018-01-22T20:13:29.150

12

Never mind, that was easy, but did not find any reference online, so here it is:

>>>import os
>>>import dask.dataframe as dd
>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "/path/to/credentials"

>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile',
                      storage_options={"profile_name":"some_user_profile"})
>>>df.head()
# works

edited Jan 22 '18 at 20:13

answered Jan 22 '18 at 20:06

muon

12,821
11
69
88

3

Documentation [here](http://dask.pydata.org/en/latest/remote-data-services.html#s3) - please feel free to submit improvements as a PR if you think it could be clearer. – mdurant Jan 22 '18 at 21:21
2

Thanks for posting both your question and answer online! Hopefully your efforts help others in the future. – MRocklin Jan 22 '18 at 21:32
@mdurant thanks I see it now, I did skim over that documentation page but missed it :( – muon Jan 22 '18 at 22:21
@muon , no problem! We are aware that the docs pages are rather voluminous :) – mdurant Jan 22 '18 at 22:41
this is not working with pd.read_parquet. Getting '''read_table() got an unexpected keyword argument 'storage_options''' – Eduardo EPF Dec 03 '21 at 13:54

How to read parquet file from s3 using dask with specific AWS profile

1 Answers1

Linked