8

Pandas (v1.0.5) use s3fs library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials file in default profile. How do I specify which profile should pandas use while reading a CSV from S3?

Eg.

s3_path = 's3://mybucket/myfile.csv'
df = pd.read_csv(s3_path)
$ cat ~/.aws/credentials
[default]
aws_access_key_id = ABCD
aws_secret_access_key = XXXX
[profile2]
aws_access_key_id = PQRS
aws_secret_access_key = YYYY
[profile3]
aws_access_key_id = XYZW
aws_secret_access_key = ZZZZ

Edit :

Current hack/working solution :

import botocore
import s3fs
session = botocore.session.Session(profile='profile2')
s3 = s3fs.core.S3FileSystem(anon=False, session=session)
df = pd.read_csv( s3.open(path_to_s3_csv) )

The only issue with above solution is you need to import 2 different libraries and instantiate 2 objects. Keeping the question open to see if there is another cleaner/simple method.

Spandan Brahmbhatt
  • 3,774
  • 6
  • 24
  • 36

5 Answers5

8
df = pd.read_csv(s3_path, storage_options=dict(profile='profile2'))
loknar
  • 539
  • 5
  • 12
3

If you only need to use one profile, setting the environment variable "AWS_DEFAULT_PROFILE" works:

import os
os.environ["AWS_DEFAULT_PROFILE"] = "profile2"
df = pd.read_csv(path_to_s3_csv)
2
import s3fs
s3 = s3fs.S3FileSystem(anon=False, profile_name="your-profile-name")

I believe to not use boto, you can use this S3FileSystem part of the s3fs. Then with a file handler something like:

with s3.open('bucket/file.txt', 'rb') as f:
emendez
  • 430
  • 5
  • 10
  • 2
    I am not sure `profile_name ` is a keyword arg. https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem – Spandan Brahmbhatt Jun 24 '20 at 20:05
  • 1
    It doesn't but it does accept a boto3 session as a parameter, and you can create your `session` with the `profile_name` and then pass it in. – Cargo23 Jun 24 '20 at 20:09
1

I'm not sure that this is "better" but it seems to be working for me using boto3 directly without needing to use s3fs or set an env variable.

import boto3
import pandas as pd

s3_session = boto3.Session(profile_name="profile_name")
s3_client = s3_session.client("s3")
df = pd.read_csv(s3_client.get_object(Bucket='bucket', Key ='key.csv').get('Body'))
Scott Brenstuhl
  • 627
  • 6
  • 7
1

If you are unable to configure your .aws/config file:

import pandas as pd
import s3fs

KEY_ID = 'xxxx'
ACCESS_KEY = 'yyyy'
BUCKET = 'my-bucket'
fp = 's3://my-bucket/test/abc.csv'


fs = s3fs.S3FileSystem(anon=False, key=KEY_ID, secret=ACCESS_KEY)
with fs.open(fp) as f:
   df = pd.read_csv(f)
yl_low
  • 1,209
  • 2
  • 17
  • 26