13

I have an AWS Lambda function which queries API and creates a dataframe, I want to write this file to an S3 bucket, I am using:

import pandas as pd
import s3fs

df.to_csv('s3.console.aws.amazon.com/s3/buckets/info/test.csv', index=False)

I am getting an error:

No such file or directory: 's3.console.aws.amazon.com/s3/buckets/info/test.csv'

But that directory exists, because I am reading files from there. What is the problem here?

I've read the previous files like this:

s3_client = boto3.client('s3')
s3_client.download_file('info', 'secrets.json', '/tmp/secrets.json')

How can I upload the whole dataframe to an S3 bucket?

wowkin2
  • 5,895
  • 5
  • 23
  • 66
Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55

3 Answers3

33

You can use boto3 package also for storing data to S3:

from io import StringIO  # python3 (or BytesIO for python2)
import boto3

bucket = 'info'  # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())
wowkin2
  • 5,895
  • 5
  • 23
  • 66
  • i tried that and i'm not getting any errors, everything seems to go through, but the file never appears in the bucket. Any idea why that might be? :< Does it take time to upload? It's a 300mb file. I thought it would at least appear in the folder in the bucket. – Raksha Dec 18 '22 at 02:54
  • @Raksha yes, it may take a while depending on file size, CPU and RAM of of your server. – wowkin2 Dec 19 '22 at 16:46
  • i waited all weekend and ran the script multiple times. Still nothing uploaded :( Any idea why that might be? – Raksha Dec 19 '22 at 17:59
  • 1
    @Raksha You need to add some try/except or verbose logging around last line in example above, so you can see what went wrong. Probably there is some error that fails silently. Another option is to wait for some time and disable internet connection on your computer and see on which step it raise exception, to see where it stuck. – wowkin2 Dec 27 '22 at 10:57
24

This

"s3.console.aws.amazon.com/s3/buckets/info/test.csv"

is not a S3 URI, you need to pass a S3 URI to save to s3. Moreover, you do not need to import s3fs (you only need it installed),

Just try:

import pandas as pd

df = pd.DataFrame()
# df.to_csv("s3://<bucket_name>/<obj_key>")

# In your case
df.to_csv("s3://info/test.csv")

NOTE: You need to create bucket on aws s3 first.

null
  • 1,944
  • 1
  • 14
  • 24
1

You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

The library is available in AWS Lambda with the addition of the layer called AWSSDKPandas-Python.

Theofilos Papapanagiotou
  • 5,133
  • 1
  • 18
  • 24