1

I currently use the following code to create a binary file that I then directly upload into AWS S3. Now I was told it's possible to write with the csv.writer directly into the binary mode and avoid the extra step with io.StringIO(). How does that work?

buffer = io.StringIO()
writer = csv.writer(buffer)
writer.writerow(["a", "b", "c"])
buffer_2 = io.BytesIO(buffer.getvalue().encode())

BUCKET_NAME = 'fbprophet'
OBJECT_NAME = 'blah.csv'

s3.upload_fileobj(buffer_2, BUCKET_NAME, OBJECT_NAME)
halfer
  • 19,824
  • 17
  • 99
  • 186
Joey Coder
  • 3,199
  • 8
  • 28
  • 60
  • python version 2 or 3? – Shijith Aug 16 '19 at 07:06
  • It's `Python 3` – Joey Coder Aug 16 '19 at 07:07
  • refer https://stackoverflow.com/questions/5358322/csv-modules-writer-wont-let-me-write-binary-out – Shijith Aug 16 '19 at 07:09
  • I mentioned that link in my post, but that's where I got stuck, I don't understand how to modify my code to manage the same. – Joey Coder Aug 16 '19 at 07:12
  • 3
    I don't believe this is a duplicate. @JoeyCoder is not writing a file, but rather an in-memory file-like object, so passing flags to `open` isn't going to cut it. I had an answer already written, the short of it is: your approach is fine, I don't see a way to significantly improve it. Just be sure to pass `encoding('utf-8')` because Python's default encoding is system-depedent. – Thomas Aug 16 '19 at 07:13
  • That's good feedback, thank you @Thomas. – Joey Coder Aug 16 '19 at 07:26

1 Answers1

3

What you've got there looks reasonable to me. The post you link to talks about writing to files, not in-memory streams. A file can be opened in either text or binary mode, which determines whether it operates on strings (str) or raw bytes (bytes). But the in-memory file-like objects from the io package aren't as flexible: you have StringIO for strings, and BytesIO for bytes.

Because csv requires a text stream (strings), and boto requires a binary stream (bytes), a conversion step is necessary.

I would recommend to pass the actual encoding to the encode() function though, to avoid falling back to Python's system-dependent default:

buffer_2 = io.BytesIO(buff.getvalue().encode('utf-8'))
halfer
  • 19,824
  • 17
  • 99
  • 186
Thomas
  • 174,939
  • 50
  • 355
  • 478