Writing pandas dataframe to S3 bucket (AWS)

Question

I have an AWS Lambda function which queries API and creates a dataframe, I want to write this file to an S3 bucket, I am using:

import pandas as pd
import s3fs

df.to_csv('s3.console.aws.amazon.com/s3/buckets/info/test.csv', index=False)

I am getting an error:

No such file or directory: 's3.console.aws.amazon.com/s3/buckets/info/test.csv'

But that directory exists, because I am reading files from there. What is the problem here?

I've read the previous files like this:

s3_client = boto3.client('s3')
s3_client.download_file('info', 'secrets.json', '/tmp/secrets.json')

How can I upload the whole dataframe to an S3 bucket?

Does this answer your question? [Save Dataframe to csv directly to s3 Python](https://stackoverflow.com/questions/38154040/save-dataframe-to-csv-directly-to-s3-python) — Asdfg, Apr 16 '20 at 15:54
You will get your answer here: https://stackoverflow.com/questions/38154040/save-dataframe-to-csv-directly-to-s3-python — Nandan Rana, Apr 21 '20 at 20:29

score 33 · Accepted Answer · answered Apr 21 '20 at 15:38

33

You can use boto3 package also for storing data to S3:

from io import StringIO  # python3 (or BytesIO for python2)
import boto3

bucket = 'info'  # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

answered Apr 21 '20 at 15:38

wowkin2

5,895
5
23
66

i tried that and i'm not getting any errors, everything seems to go through, but the file never appears in the bucket. Any idea why that might be? :< Does it take time to upload? It's a 300mb file. I thought it would at least appear in the folder in the bucket. – Raksha Dec 18 '22 at 02:54
@Raksha yes, it may take a while depending on file size, CPU and RAM of of your server. – wowkin2 Dec 19 '22 at 16:46
i waited all weekend and ran the script multiple times. Still nothing uploaded :( Any idea why that might be? – Raksha Dec 19 '22 at 17:59
1

@Raksha You need to add some try/except or verbose logging around last line in example above, so you can see what went wrong. Probably there is some error that fails silently. Another option is to wait for some time and disable internet connection on your computer and see on which step it raise exception, to see where it stuck. – wowkin2 Dec 27 '22 at 10:57

null · Answer 2 · 2020-04-20T10:32:23.130

24

This

"s3.console.aws.amazon.com/s3/buckets/info/test.csv"

is not a S3 URI, you need to pass a S3 URI to save to s3. Moreover, you do not need to import s3fs (you only need it installed),

Just try:

import pandas as pd

df = pd.DataFrame()
# df.to_csv("s3://<bucket_name>/<obj_key>")

# In your case
df.to_csv("s3://info/test.csv")

NOTE: You need to create bucket on aws s3 first.

edited Apr 20 '20 at 10:32

answered Apr 20 '20 at 09:53

null

1,944
1
14
24

9

small notice. to make this work s3fs package should be installed. – Anton Pomieshchenko Apr 20 '20 at 10:28
Yes, I didn't state it but of course, pandas would ask for it, I will add it to the answer – null Apr 20 '20 at 10:31
Useful answer @null, in case AWS Lambda is used, how to install s3fs, thanks ? – pc_pyr Sep 13 '20 at 15:41
@pc_pyr you may find [this page](https://docs.aws.amazon.com/lambda/latest/dg/python-package.html) useful. – null Sep 13 '20 at 20:23

score 1 · Answer 3 · answered Jan 13 '23 at 00:00

You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

The library is available in AWS Lambda with the addition of the layer called AWSSDKPandas-Python.

Writing pandas dataframe to S3 bucket (AWS)

3 Answers3