14

I know that by passing the compression='gzip' argument to pd.to_csv() I can save a DataFrame into a compressed CSV file.

my_df.to_csv('my_file_name.csv', compression='gzip')

I also know that if I want to append a DataFrame to the end of an existing CSV file I can use mode='a', like so

my_df.to_csv('my_file_name.csv', mode='a', index=False)

But what if I want to append a DataFrame to the end of a compressed CSV file? Is that even possible? I tried to do so with

my_df.to_csv('my_file_name.csv', mode='a', index=False, compression='gzip')

But the resulting CSV was not compressed, albeit in fine condition.


This question is motivated by my processing of a large CSV file with Pandas. I need to produce compressed CSV output, and am processing the CSV file in chunks into a DataFrame so that I don't run into a MemoryError. Hence, the most seemingly logical thing for me to do is to append each output DataFrame chunk together into one compressed zip file.

I am using Python 3.4 and Pandas 0.16.1.

Alex Fortin
  • 2,105
  • 1
  • 18
  • 27
Eric Hansen
  • 1,749
  • 2
  • 19
  • 39
  • 2
    Appending a gzipped data frame works for me in pandas 0.18.1. You can also just [concatenate gzipped files](http://stackoverflow.com/questions/8005114/fast-concatenation-of-multiple-gzip-files). – ptrj Jul 29 '16 at 12:57

3 Answers3

3

Up-to-date answer: worked for me with pandas 1.2.4

Code:

df.to_csv('test.csv', mode='a', compression='gzip')
new_df = pd.read_csv('test.csv', compression='gzip')

df.shape[0] # 1x
new_df.shape[0] # 2x
Julian
  • 154
  • 1
  • 11
1

You can do the following

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    df.to_csv(compressed_file, index=False)

since pandas .to_csv method accepts a path or a file-like object.

paulo.filip3
  • 3,167
  • 1
  • 23
  • 28
1

The above answer does not seem to work anymore. When df.to_csv() is handed no path or file-like object it returns the dataframe as string. This can be encoded and written to the gzip file.

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    compressed_file.write(df.to_csv().encode())
cmosig
  • 1,187
  • 1
  • 9
  • 24