0

I'm trying to extract a data from BQ and save it to CSV and then upload it to s3, but I'm having error with the uploading to s3. This is the error I get when I run the script:

raise ValueError('Filename must be a string')

If you can please help me solve this issue, I'm new to Python and AWS. Thank you

Script is:



    rows_df = query_job.result().to_dataframe() 
    file_csv = rows_df.to_csv(s3_filename, sep='|', index=False, encoding='utf-8')
    s3.upload_file(file_csv, s3_bucket, file_csv)


Justine
  • 105
  • 1
  • 17

2 Answers2

2

Try changing the arguments passed to s3.upload_file like so:

s3.upload_file(s3_filename, s3_bucket, s3_filename)

The to_csv writes the dataframe to a local file at path s3_filename and file_csv is None. Alternatively, if your dataframe is small enough to be held in memory, the following should do the trick:

import io
data = rows_df.to_csv(sep='|', index=False, encoding='utf-8')
data_buffer = io.BytesIO(data)
s3.upload_fileobj(data_buffer, s3_bucket, s3_filename)
Milan Cermak
  • 7,476
  • 3
  • 44
  • 59
  • whats the difference between upload_file and upload_fileobj? I tried running the script with your suggestion and it return the error ValueError: Fileobj must implement read. – Justine Jun 07 '20 at 17:45
  • Here's a SO answer to that https://stackoverflow.com/questions/52336902/what-is-the-difference-between-s3-client-upload-file-and-s3-client-upload-file I've updated the second code example, the data_buffer is now an file-like object that upload_fileobj accepts. – Milan Cermak Jun 07 '20 at 18:35
0

Based on pandas doc, to_csv returns None when path_or_buf is specified. However, upload_file needs a filename and a S3 key in its first and third argument respectively. Therefore, something like this could make this work.

s3.upload_file(s3_filename, s3_bucket, s3_filename)
jellycsc
  • 10,904
  • 2
  • 15
  • 32