35

I would like to write some comments in my CSV file created with pandas. I haven't found any option for this in DataFrame.to_csv (even though read_csv can skip comments) neither in the standard csv module. I can open the file, write the comments (line starting with #) and then pass it to to_csv. Does any body have a better option?

Mathieu Dubois
  • 1,054
  • 3
  • 14
  • 22
  • I don't think so it's easy enough to add comments and append that this seems unnecessary, also you'd have a support issue with some people maybe wanting different comment markers, multi-line support etc.. – EdChum Mar 24 '15 at 13:11
  • 1
    I can understand why `pandas` developpers don't provide such options. I simply wanted a trick to do that. – Mathieu Dubois Mar 24 '15 at 13:14
  • [post a feature request](https://github.com/pydata/pandas/issues) or issue a pull request – EdChum Mar 24 '15 at 13:23
  • 1
    Alternatively you can get the csv output as string... modify it and then write it (see accepted answer http://stackoverflow.com/questions/23231605/convert-pandas-dataframe-to-csv-string) – etna Mar 24 '15 at 13:24
  • Good to know but seems a bit more complex than what I and @Vor suggested. – Mathieu Dubois Mar 24 '15 at 13:31

2 Answers2

59

df.to_csv accepts a file object. So you can open a file in a mode, write you comments and pass it to the dataframe to_csv function.

For example:

In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]})

In [37]: f = open('foo', 'a')

In [38]: f.write('# My awesome comment\n')

In [39]: f.write('# Here is another one\n')

In [40]: df.to_csv(f)

In [41]: f.close()

In [42]: more foo
# My awesome comment
# Here is another one
,a,b
0,1,1
1,2,2
2,3,3
Vor
  • 33,215
  • 43
  • 135
  • 193
  • Yep, that was what I suggested... I think it's a bit simpler than @etna suggestion (see comments under my question). – Mathieu Dubois Mar 24 '15 at 13:31
  • @Mathieu Dubois sorry, just realized that you suggested exactly the same thing =) – Vor Mar 24 '15 at 13:33
  • 4
    Might be because this is a response in 2015, and pandas version in 2020 is slightly different, but the current answer outputs a blank line in between each row. The below code does it without the blanks lines. You also have more freedom with mode='a' from pandas for append. f = open(path, 'a') f.write('comment\n') f.close() df.to_csv(path, mode='a') – Tim Johnsen Sep 25 '20 at 21:41
9

An alternative approach @Vor's solution is to first write the comment to a file, and then use mode='a' with to_csv() to add the content of the data frame to the same file. According to my benchmarks (below), this takes about as long as opening the file in append mode, adding the comment and then passing the file handler to pandas (as per @Vor's answer). The similar timings make sense considering that this is what pandas in doing internally (DataFrame.to_csv() calls CSVFormatter.save(), which uses _get_handles() to read in the file via open().

On a separate note, it is convenient work with file IO via with statement which ensures that opened files close when you're done with them and leave the with statement. See examples in the benchmarks below.

Read in test data

import pandas as pd
# Read in the iris data frame from the seaborn GitHub location
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# Create a bigger data frame
while iris.shape[0] < 100000:
    iris = iris.append(iris)
# `iris.shape` is now (153600, 5)

1. Append with the same file handler

%%timeit -n 5 -r 5

# Open a file in append mode to add the comment
# Then pass the file handle to pandas
with open('test1.csv', 'a') as f:
    f.write('# This is my comment\n')
    iris.to_csv(f)
972 ms ± 31.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

2. Reopen the file with to_csv(mode='a')

%%timeit -n 5 -r 5

# Open a file in write mode to add the comment
# Then close the file and reopen it with pandas in append mode
with open('test2.csv', 'w') as f:
    f.write('# This is my comment\n')
iris.to_csv('test2.csv', mode='a')
949 ms ± 19.3 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
joelostblom
  • 43,590
  • 17
  • 150
  • 159