I get a large excel file(100 MB +) which has data of various markets. I have specific filters of my market like 'Country Name' : 'UK', 'Initiation Date' : after "2010 Jan". I wanted to make a python program to make this filtering and writing data to a new excel file process automated but openpyxl takes too much time in loading an excel this big. I also tried a combination of openpyxl and xlsxwriter where i read the file read_only mode by iterating over rows in openpyxl and writing it in a new file with xlsxwriter but this takes too much time as well. Is there any simpler way to achieve this ?
Asked
Active
Viewed 210 times
1 Answers
-1
Not sure wheather pandas can handle very large files but did you try Pandas?
mydf = pandas.read_excel(large_file.xlsx)
on reading time you can leave out columns you don't need
then filter your dataframe as discussed here Select rows from dataframe
then write dataframe back to excel
mydf.to_excel('foo.xlsx', sheet_name='Sheet1')

knobi
- 762
- 1
- 6
- 7
-
This won't really help as Pandas uses openpyxl to read the file. – Charlie Clark Sep 29 '20 at 09:31