How to filter and write data from a large excel file?

Question

I get a large excel file(100 MB +) which has data of various markets. I have specific filters of my market like 'Country Name' : 'UK', 'Initiation Date' : after "2010 Jan". I wanted to make a python program to make this filtering and writing data to a new excel file process automated but openpyxl takes too much time in loading an excel this big. I also tried a combination of openpyxl and xlsxwriter where i read the file read_only mode by iterating over rows in openpyxl and writing it in a new file with xlsxwriter but this takes too much time as well. Is there any simpler way to achieve this ?

score -1 · Answer 1 · answered Sep 29 '20 at 01:08

-1

Not sure wheather pandas can handle very large files but did you try Pandas?

mydf = pandas.read_excel(large_file.xlsx)

on reading time you can leave out columns you don't need

then filter your dataframe as discussed here Select rows from dataframe

then write dataframe back to excel

mydf.to_excel('foo.xlsx', sheet_name='Sheet1')

answered Sep 29 '20 at 01:08

knobi

762
1
6
7

This won't really help as Pandas uses openpyxl to read the file. – Charlie Clark Sep 29 '20 at 09:31

How to filter and write data from a large excel file?

1 Answers1