-2

I get a large excel file(100 MB +) which has data of various markets. I have specific filters of my market like 'Country Name' : 'UK', 'Initiation Date' : after "2010 Jan". I wanted to make a python program to make this filtering and writing data to a new excel file process automated but openpyxl takes too much time in loading an excel this big. I also tried a combination of openpyxl and xlsxwriter where i read the file read_only mode by iterating over rows in openpyxl and writing it in a new file with xlsxwriter but this takes too much time as well. Is there any simpler way to achieve this ?

1 Answers1

-1

Not sure wheather pandas can handle very large files but did you try Pandas?

mydf = pandas.read_excel(large_file.xlsx)

on reading time you can leave out columns you don't need

then filter your dataframe as discussed here Select rows from dataframe

then write dataframe back to excel

mydf.to_excel('foo.xlsx', sheet_name='Sheet1')
knobi
  • 762
  • 1
  • 6
  • 7