I'm trying to merge multiple excel files with pandas using the following code:
import pandas as pd
from os.path import basename
df = []
for f in ['test1.xlsx', 'test2.xlsx']:
data = pd.read_excel(f, 'Sheet1')
data.index = [basename(f)] * len(data)
df.append(data)
df = pd.concat(df)
df.to_excel("merged_data2.xlsx")
Which works fine on the test files but when trying it on 3 other excel files of size over 100mb each the process becomes too slow to be useful. I saw this other post on the subject: Why does concatenation of DataFrames get exponentially slower?
And I believe I have correctly followed the advice of using a list before concatenating but without success. Any ideas? Thanks.