Firstly, I understand how to write UTF-8 from strings in Python3 and that StringIO
is recommended for such string building. However, I specifically need a binary file-like object and for that I need BytesIO
. If I do the following then the data ends up blowing up because it gets read as Latin1, my computer's default locale/charset.
with io.StringIO() as sb:
csv.writer(sb).writerows(rows)
sb.flush()
sb.seek(0)
# blows up with Latin1 encoding error
job = bq.load_table_from_file(sb, table_ref, job_config=job_config)
So my work-around is this monstrosity that doubles the amount of memory used:
with io.StringIO() as sb:
csv.writer(sb).writerows(rows)
sb.flush()
sb.seek(0)
with io.BytesIO(sb.getvalue().encode('utf-8')) as buffer:
job = bq.load_table_from_file(buffer, table_ref, job_config=job_config)
Somewhere in this chain there must be a way to specify the byte-encoding so that readers of the file-like sb
will see the data as UTF-8. Or is there a way to use csv.writer()
with a byte stream?
I've looked for both of these answers on StackOverflow but what I've found has generally been for writing to files and for stuff in memory everything points to StringIO
.