Pandas read_csv specify AWS Profile

Question

Pandas (v1.0.5) use s3fs library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials file in default profile. How do I specify which profile should pandas use while reading a CSV from S3?

Eg.

s3_path = 's3://mybucket/myfile.csv'
df = pd.read_csv(s3_path)

$ cat ~/.aws/credentials
[default]
aws_access_key_id = ABCD
aws_secret_access_key = XXXX
[profile2]
aws_access_key_id = PQRS
aws_secret_access_key = YYYY
[profile3]
aws_access_key_id = XYZW
aws_secret_access_key = ZZZZ

Edit :

Current hack/working solution :

import botocore
import s3fs
session = botocore.session.Session(profile='profile2')
s3 = s3fs.core.S3FileSystem(anon=False, session=session)
df = pd.read_csv( s3.open(path_to_s3_csv) )

The only issue with above solution is you need to import 2 different libraries and instantiate 2 objects. Keeping the question open to see if there is another cleaner/simple method.

I believe you are going to have to use boto3 to download the csv file and then call `pd.read_csv()` on it. — Cargo23, Jun 24 '20 at 19:51
Yes that's one way. I am trying to see if it is possible to read_csv without downloading it locally. — Spandan Brahmbhatt, Jun 24 '20 at 20:06

score 8 · Answer 1 · answered Mar 31 '22 at 22:36

8

df = pd.read_csv(s3_path, storage_options=dict(profile='profile2'))

answered Mar 31 '22 at 22:36

loknar

539
5
12

score 3 · Answer 2 · answered Mar 30 '21 at 09:11

3

If you only need to use one profile, setting the environment variable "AWS_DEFAULT_PROFILE" works:

import os
os.environ["AWS_DEFAULT_PROFILE"] = "profile2"
df = pd.read_csv(path_to_s3_csv)

answered Mar 30 '21 at 09:11

Anil Sharma

41
6

score 2 · Answer 3 · answered Jun 24 '20 at 19:58

2

import s3fs
s3 = s3fs.S3FileSystem(anon=False, profile_name="your-profile-name")

I believe to not use boto, you can use this S3FileSystem part of the s3fs. Then with a file handler something like:

with s3.open('bucket/file.txt', 'rb') as f:

answered Jun 24 '20 at 19:58

emendez

430
5
10

2

I am not sure `profile_name ` is a keyword arg. https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem – Spandan Brahmbhatt Jun 24 '20 at 20:05
1

It doesn't but it does accept a boto3 session as a parameter, and you can create your `session` with the `profile_name` and then pass it in. – Cargo23 Jun 24 '20 at 20:09

score 1 · Answer 4 · answered Apr 15 '21 at 22:35

I'm not sure that this is "better" but it seems to be working for me using boto3 directly without needing to use s3fs or set an env variable.

import boto3
import pandas as pd

s3_session = boto3.Session(profile_name="profile_name")
s3_client = s3_session.client("s3")
df = pd.read_csv(s3_client.get_object(Bucket='bucket', Key ='key.csv').get('Body'))

score 1 · Answer 5 · answered Apr 21 '22 at 08:50

If you are unable to configure your .aws/config file:

import pandas as pd
import s3fs

KEY_ID = 'xxxx'
ACCESS_KEY = 'yyyy'
BUCKET = 'my-bucket'
fp = 's3://my-bucket/test/abc.csv'


fs = s3fs.S3FileSystem(anon=False, key=KEY_ID, secret=ACCESS_KEY)
with fs.open(fp) as f:
   df = pd.read_csv(f)

Pandas read_csv specify AWS Profile

5 Answers5

Linked

Related