46

I wonder how to add new DataFrame data onto the end of an existing csv file? The to_csv doesn't mention such functionality.

vvvvv
  • 25,404
  • 19
  • 49
  • 81
perigee
  • 9,438
  • 11
  • 31
  • 35
  • Possible duplicate of [How to add pandas data to an existing csv file?](https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file) – 9769953 Sep 26 '18 at 14:53

4 Answers4

80

You can append using to_csv by passing a file which is open in append mode:

with open(file_name, 'a') as f:
    df.to_csv(f, header=False)

Use header=None, so as not to append the column names.

In fact, pandas has a wrapper to do this in to_csv using the mode argument (see Joe's answer):

df.to_csv(f, mode='a', header=False)
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 3
    Also need to close the file by f.close(). Andy, you make my day. It works like a charm, I'm from c/c++ ethnic and need to learn the python philosophy. Any suggestion? – perigee Jun 16 '13 at 16:08
  • Andy, really appreciated :-D (cannot use @ symbol :-() – perigee Jun 16 '13 at 16:17
  • 1
    Bonus points that this closes the file after to_csv. I have some code that hits to_csv alot and was finding the files left open on later iterations. – Ezekiel Kruglick Jun 21 '15 at 04:11
  • @EzekielKruglick Were you passing an open file to `to_csv` or the filename? I recall a related issue where *not* closing the file led to a 99% speedup of their code (IIRC they were appending to the same file tens of thousands of times). – Andy Hayden Jun 26 '15 at 20:52
  • @Andy Hayden - First I tried passing a filename, then I moved to this with loop style and passing a handle, and even added an f.close() within the with context, I still get the file open maybe one time in ten thousand. I'm starting to suspect the operating system (Windows) is at fault, actually, perhaps the shadow copy service used for backup tracking, although I have not yet properly determined that. So, quite possibly not Python related I'm thinking now. – Ezekiel Kruglick Jun 27 '15 at 18:43
  • @EzekielKruglick Funny you should say that, with regards to the 99% statistic above - that was on Windows, I only saw 33% on OSX. IIRC the `close` was the slow part. My guess was triggers writing out the buffer from memory to disk (copying the entire file) whilst unix has some a cleverer filesystem... not really sure. I strongly recommend looking into HDF5/pytables. – Andy Hayden Jun 28 '15 at 03:20
  • @Andy Hayden - yes, I use HDF5 too and have never had any issues with files left open with HDF5, even in the same code base. – Ezekiel Kruglick Jul 01 '15 at 04:17
  • @AndyHayden - Just as a belated followup to the discussion above. After much detailed examination including finally finding a reproducable way to get the file problem I've determined the problem of finding the csvfiles left open was not in pandas. That's what I thought (above), but certain now. Just a really convoluted logic path in some supporting filesystem APIs! :) – Ezekiel Kruglick Aug 06 '15 at 20:51
  • @EzekielKruglick I'm sure this would make a nice blog post :) – Andy Hayden Aug 07 '15 at 05:29
  • 1
    @perigee when "with" is used the file is closed automatically always. http://blog.lerner.co.il/dont-use-python-close-files-answer-depends/ – lesolorzanov Nov 14 '18 at 09:53
47

You can also pass the file mode as an argument to the to_csv method

df.to_csv(file_name, header=False, mode = 'a')
Leland Hepworth
  • 876
  • 9
  • 16
Joe Hooper
  • 816
  • 5
  • 6
4

A little helper function I use (based on Joe Hooper's answer) with some header checking safeguards to handle it all:

def appendDFToCSV_void(df, csvFilePath, sep=","):
    import os
    if not os.path.isfile(csvFilePath):
        df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
    elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
        raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
    elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
        raise Exception("Columns and column order of dataframe and csv file do not match!!")
    else:
        df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
KCzar
  • 1,024
  • 1
  • 9
  • 11
  • Is there an API setting for the 3rd test case, column order not matching between dataframe and csv? I want to write without headers, but have the columns be implicitly reordered. – floatingice Jun 07 '19 at 17:36
3

Thank to Andy, the complete solution:

f = open(filename, 'a') # Open file as append mode
df.to_csv(f, header = False)
f.close()
perigee
  • 9,438
  • 11
  • 31
  • 35
  • 8
    Just to mention, this is essentially equivalent to above but after this you're left with a closed file (f), whereas with [`with`](http://preshing.com/20110920/the-python-with-statement-by-example) it cleans up that for you. :) – Andy Hayden Jun 16 '13 at 16:28