Merge based on multiple columns of all excel files from a directory in Python

Question

Say I have a dataframe df, and a directory ./ which has the following excel files inside:

path = './'
for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(('.xls', '.xlsx')):
            print(os.path.join(root, file))
            # dfs.append(read_dfs(os.path.join(root, file)))
# df = reduce(lambda left, right: pd.concat([left, right], axis = 0), dfs)

Out:

df1.xlsx,
df2.xlsx,
df3.xls
...

I want to merge df with all files from path based on common columns date and city. It works with the following code, but it's not concise enough.

So I raise a question for improving the code, thank you.

df = pd.merge(df, df1, on = ['date', 'city'], how='left')
df = pd.merge(df, df2, on = ['date', 'city'], how='left')
df = pd.merge(df, df3, on = ['date', 'city'], how='left')
...

Reference:

pandas three-way joining multiple dataframes on columns

score 1 · Accepted Answer · answered Apr 23 '20 at 11:07

1

The following code may works:

from functools import reduce

dfs = [df0, df1, df2, dfN]
df_final = reduce(lambda left, right: pd.merge(left, right, on=['date', 'city']), dfs)

answered Apr 23 '20 at 11:07

ah bon

9,293
12
65
148

Merge based on multiple columns of all excel files from a directory in Python

1 Answers1

Linked