3

When writing a DataFrame to a csv file, I would like to append to the file, instead of overwriting it.

While pandas DataFrame has the .to_csv() method with the mode parameter available, thus allowing to append the DataFrame to a file, None of the Polars DataFrame write methods seem to have that parameter.

Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
  • Similar post regarding Parquet files [here](https://stackoverflow.com/questions/74915653/how-append-data-to-parquet-file-with-save-dataframe-from-polars). From the documentation, it looks like Polars does not support appending to CSV files either. – adamius Jan 29 '23 at 06:10
  • @adamius Yes, I saw the the post about the Parquets. Wanted to be sure that Polars does not have support appending to files as a DataFrame method. Thanks. – Mr. Caribbean Jan 29 '23 at 06:16
  • You can pass a filehandle/Path obj e.g. `with open("out.csv", mode="ab") as f: df.write_csv(f, has_header=False)` – jqurious Jan 29 '23 at 06:25
  • @jqurious that didn't work. Because df.write_csv() expects a string and not a byte object. – Mr. Caribbean Jan 29 '23 at 06:33
  • 1
    It works for me - did you use `mode="ab"` exactly? The `b` is also required. – jqurious Jan 29 '23 at 06:40
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 29 '23 at 06:41
  • Mr. @jqurious that indeed worked for me! It's a nice workaround. Now I'm left with the intrigue of why Polars does not provide that capability – Mr. Caribbean Jan 29 '23 at 06:46
  • 1
    @Mr.Caribbean I would guess that it's some combination of (in no particular order) no one asking for it, it being trivial to get the functionality with base python, and there being higher priority functionality to work on. – Dean MacGregor Jan 29 '23 at 15:51
  • @DeanMacGregor yes. Maybe keep it practical and focus on lazy, parallel, memory management and speed. – Mr. Caribbean Jan 30 '23 at 03:22

1 Answers1

5

To append to a CSV file for example - you can pass a file object e.g.

import polars as pl

df1 = pl.DataFrame({"a": [1, 2], "b": [3 ,4]})
df2 = pl.DataFrame({"a": [5, 6], "b": [7 ,8]})

with open("out.csv", mode="ab") as f:
   df1.write_csv(f)
   df2.write_csv(f, has_header=False)
>>> from pathlib import Path
>>> print(Path("out.csv").read_text(), end="")
a,b
1,3
2,4
5,7
6,8
jqurious
  • 9,953
  • 1
  • 4
  • 14