2

I am using this piece of code for reading a csv(around 1 GB) using pandas and then writing into multiple excel sheets using chunksize.

with pd.ExcelWriter('/tmp/output.xlsx',engine='xlsxwriter') as writer:
        reader = pd.read_csv(f'/tmp/{file_name}', sep=',', chunksize=1000000)
        for idx, chunk in enumerate(reader):
            chunk.to_excel(writer, sheet_name=f"Report (P_{idx + 1})", index=False)
        writer.save()

This approach is taking a lot of time .Can anyone please suggest any approaches to reduce this time?

itas97
  • 68
  • 9
  • Do you want to use Pandas library only? There are other libraries which will work better for this purpose. – Prakash Palnati Aug 12 '20 at 06:49
  • I can use other Python packages as long they are open source and don't have any licensing issues. – itas97 Aug 12 '20 at 07:01
  • Does this answer your question? [Pandas - split large excel file](https://stackoverflow.com/questions/41321082/pandas-split-large-excel-file) – AtanuCSE Aug 12 '20 at 07:14
  • you can do this with python's csv module and openpyxl. no need for Pandas here, since there is no data manipulation involved. Have a look at my [answer](https://stackoverflow.com/a/61927477/7175713) and see if it is relevant. Also, how long does your current process take? – sammywemmy Aug 12 '20 at 07:44
  • I finally used [PyExcelerate](https://pypi.org/project/PyExcelerate/) for my use case – itas97 Nov 17 '20 at 15:15

1 Answers1

0

Some days ago i have faced same problem so i tried with

you can use library called as vaex [1]: https://vaex.readthedocs.io/en/latest/

Or if you to to do itself with pandas try to use apache pyspark

Or use can use Google colud with 1200 credit

  • I want to write the large csv to an excel wtith multiple sheets.I was not able to find any excel related utility in this package.Visualization is not my requirement. – itas97 Aug 12 '20 at 06:59
  • Then use google colab or google clod for free esle there in no option –  Aug 12 '20 at 07:26