Can I append to a compressed stream with pandas?

Question

I know that by passing the compression='gzip' argument to pd.to_csv() I can save a DataFrame into a compressed CSV file.

my_df.to_csv('my_file_name.csv', compression='gzip')

I also know that if I want to append a DataFrame to the end of an existing CSV file I can use mode='a', like so

my_df.to_csv('my_file_name.csv', mode='a', index=False)

But what if I want to append a DataFrame to the end of a compressed CSV file? Is that even possible? I tried to do so with

my_df.to_csv('my_file_name.csv', mode='a', index=False, compression='gzip')

But the resulting CSV was not compressed, albeit in fine condition.

This question is motivated by my processing of a large CSV file with Pandas. I need to produce compressed CSV output, and am processing the CSV file in chunks into a DataFrame so that I don't run into a MemoryError. Hence, the most seemingly logical thing for me to do is to append each output DataFrame chunk together into one compressed zip file.

I am using Python 3.4 and Pandas 0.16.1.

Appending a gzipped data frame works for me in pandas 0.18.1. You can also just [concatenate gzipped files](http://stackoverflow.com/questions/8005114/fast-concatenation-of-multiple-gzip-files). — ptrj, Jul 29 '16 at 12:57

score 3 · Answer 1 · answered May 31 '21 at 22:13

3

Up-to-date answer: worked for me with pandas 1.2.4

Code:

df.to_csv('test.csv', mode='a', compression='gzip')
new_df = pd.read_csv('test.csv', compression='gzip')

df.shape[0] # 1x
new_df.shape[0] # 2x

answered May 31 '21 at 22:13

Julian

154
1
11

yea, but it's not compressed. – user189035 May 11 '22 at 11:49

score 1 · Answer 2 · answered Nov 12 '18 at 19:21

1

You can do the following

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    df.to_csv(compressed_file, index=False)

since pandas .to_csv method accepts a path or a file-like object.

answered Nov 12 '18 at 19:21

paulo.filip3

3,167
1
23
28

1

Does not work with python 3.6.9, pandas 0.25.3: `TypeError: memoryview: a bytes-like object is required, not 'str'` – ApproachingDarknessFish Jan 03 '20 at 00:14
1

How was this resolved. I am getting the same error. – Nagasri Varma Jul 10 '20 at 08:05
`compressed_file.write(df.to_csv().encode())` now works. – cmosig Aug 26 '20 at 09:35

score 1 · Answer 3 · answered Aug 26 '20 at 09:38

The above answer does not seem to work anymore. When df.to_csv() is handed no path or file-like object it returns the dataframe as string. This can be encoded and written to the gzip file.

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    compressed_file.write(df.to_csv().encode())

Can I append to a compressed stream with pandas?

3 Answers3