4

I am running some numerical simulations with python, pandas and scipy. I run a set of scenarios, and for each scenario I create a detailed dataframe with lots of outputs, which I save to separate CSV files. Each CSV file is about 900 KB.

The line I use is, banally:

mydataframe.to_csv('myoutput.csv')

My question is: is there a way to speed the exporting process? Some specific parameters, a different library, etc. I ask because writing to CSV takes almost half the time of the total simulation: running 18 scenarios takes 17 seconds, 7.2 of which spent in the to_csv method.

PS I had initially wanted to write to Excel, but that's too slow, as per my other question: Python: fastest way to write pandas DataFrame to Excel on multiple sheets

Community
  • 1
  • 1
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • Have you profiled this? can you compare the performance using [`np.savetxt`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html) – EdChum Jul 09 '15 at 10:56
  • each dataframe has from 300 to 400 columns. How can I get np.savetxt to write column headings? I understand it has a header argument, but it doesn't seem to accept a list of column names. – Pythonista anonymous Jul 09 '15 at 11:25
  • As you can read in the docs (http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html), it is a string that will be written at the beginning of the file. So you can do `','.join(mydataframe.columns)` – joris Jul 09 '15 at 13:38
  • I can't get np.savetxt to work with non-numerical arrays, which is a problem because my dataframe has many text fields. – Pythonista anonymous Aug 18 '15 at 15:08

1 Answers1

0

Try compressing the file:

mydataframe.to_csv('myoutput.gz', compression='gzip')

VnC
  • 1,936
  • 16
  • 26