1

I have a dataframe with ~7 million lines and 18 columns.

What is the fastest library that consumes less time?

Right now I am writing the dataframe using:

df.to_csv('file.csv', header=True, index=False)

And it is taking me ~3 hours.

The file has ~800 MB.

Is there a faster method/library to speed up the writing process?

eduardo2111
  • 379
  • 3
  • 21
  • 2
    You could try with `numpy.savetxt`: Take a look of this [answer](https://stackoverflow.com/a/54617862/13676202). – MrNobody33 Jul 13 '20 at 17:54

1 Answers1

0

Try using https://pypi.org/project/pyarrow/

I've found it to be 86% faster for reading and 30% faster for writing CSV files as compared to pandas!