15

I'm trying to port some code from Python 2.7 to Python 3. The 2to3 tool works fine for the base syntax and package changes, but now we're into some strange side effects.

I have the following code block. It opens a temporary file name using the gzip module.

f = NamedTemporaryFile(delete=False)
f.close()
fn = f.name + '.gz'
os.rename(f.name, fn)
fz = gzip.open(fn, 'wb')
writer = csv.writer(fz, delimiter='\t', lineterminator=lt)
for row in table:
    writer.writerow(row)
fz.close()

The problem is that executing this gives me the following error:

File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/gzip.py", line 343, in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface

I've tried opening the gzip file as 'w' instead of 'wb', but to no avail. I'm guessing the gzip module is expecting a byte array, but the CSV Writer doesn't or won't provide anything other than a string.

How do people do this in Python 3?

Edit: I should mention that this code block executes without issue in Python 2.7.

WineSoaked
  • 1,455
  • 1
  • 14
  • 14

1 Answers1

36

You need to change the mode of gzip to wt :

fz = gzip.open(fn, 'wt')

Also a little-known feature of gzip.open() and bz2.open() is that they can be layered on top of an existing file opened in binary mode. For example, this works:

import gzip
f = open('somefile.gz', 'rb')
with gzip.open(f, 'rt') as g:
    text = g.read()

This allows the gzip and bz2 modules to work with various file-like objects such as sockets, pipes, and in-memory files.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Yup, Python 3 means you have to be very VERY careful with the flags used to open the files. Man, what a headache. – WineSoaked Nov 29 '14 at 19:42
  • @WineSoaked :) , yes ,and also a little-known feature of `gzip.open()` and `bz2.open()` is that they can be layered on top of an existing file opened in binary mode . – Mazdak Nov 29 '14 at 19:44
  • 5
    You should open for writing with `gzip.open(fn, 'wt', newline='')`, otherwise csv's newline handling may be compromised (ie: if a csv entry has an embedded newline). – drevicko Jan 29 '18 at 12:05
  • If in the same script you want to upload that file, then you need to close the file handler `f`, otherwise the uploaded gzip file will be damaged. So just do `f.close()` after the `with` block. – Zoltan Fedor Aug 17 '18 at 19:52