Hope you can help me.
I have a folder where there are several .xlsx files with similar structure (NOTE that some of the files might be bigger than 50MB). I want to combine them all together and (eventually) send them to a database. But before that, I need to improve the performance of this block of code because sometimes it takes a lot of time to process all those files.
The code in question is this:
df_list = []
for file in location:
df_list.append(pd.read_excel(file, header=0, engine='openpyxl'))
df_concat = pd.concat(df_list)
Any suggestions?
Somewhere I read that converting Excel files to CSV might improve the performance, but should I do that before appending the files or after everything is concatenated? And considering df_list is a list, can I do that conversion?