2

I have 5 dataframes as follows:

df1:
  provinceCode   provinceName
0    11          XXX
1    12          XXX
2    13          XXX
3    14          XXX
4    15          XXX

df2:
   cityCode  cityName  provinceCode
0  1101       XXX            11
1  1201       XXX            12
2  1301       XXX            13
3  1302       XXX            13
4  1303       XXX            13
df3:
  areaCode  areaName  cityCode  provinceCode
0  110101    XXX      1101            11
1  110102    XXX      1101            11
2  110105    XXX      1101            11
3  110106    XXX      1101            11
4  110107    XXX      1101            11
df4:
   streetCode    streetName  areaCode  provinceCode  cityCode
0  110101001       XXX       110101       11      1101
1  110101002       XXX       110101       11      1101
2  110101003       XXX       110101       11      1101
3  110101004       XXX       110101       11      1101
4  110101005       XXX       110101       11      1101
df5:
   villageCode    villageName         streetCode  provinceCode  cityCode  areaCode
0  110101001001   XXX   110101001            11      1101    110101
1  110101001002   XXX   110101001            11      1101    110101
2  110101001005   XXX   110101001            11      1101    110101
3  110101001006   XXX   110101001            11      1101    110101
4  110101001007   XXX   110101001            11      1101    110101

I hope to merge all dfs together especially their name and drop codes columns. I have tried merge them one by one which is quite inefficient. Just wonder if there is a better way to do that? Thanks. Here is what I have tried:

df6 = df5.merge(df4[['code', 'name']], left_on = 'provinceCode', right_on = 'code', how = 'left')  
df7 = df6.merge(df3[['code', 'name']], left_on = 'areaCode', right_on = 'code', how = 'left')   
df8 = df7.merge(df2[['code', 'name']], left_on = 'areaCode', right_on = 'code', how = 'left')    
df = df8.merge(df1[['provinceCode', 'provinceName']], left_on = 'provinceCode', right_on = 'code', how = 'left')
ah bon
  • 9,293
  • 12
  • 65
  • 148
  • 1
    The section on "merging multiple DataFrames" tells you how to do this using `reduce` and `partial`. If you have any questions, let me know and I will help you. – cs95 Jan 05 '19 at 06:19
  • dfs = [df1, df2, df3, df4, df5] from functools import partial outer_merge = partial(pd.merge, how='outer') reduce(outer_merge, dfs) – ah bon Jan 05 '19 at 06:28
  • 1
    Look good. Does it work? – cs95 Jan 05 '19 at 06:29
  • Yes. Thanks, master!! :) – ah bon Jan 05 '19 at 06:32
  • 1
    Reduce is by far better than chained mergers. Both short code, elegant and fast. The only issue is that we sacrifice readability. I have added DRY on the answer in case readability is desired over efficiency. – Prayson W. Daniel Jan 05 '19 at 07:08

1 Answers1

2

Updated answer courtesy @OP:

dfs = [df1, df2, df3, df4, df5] 
from functools import partial 
outer_merge = partial(pd.merge, how='outer') 
reduce(outer_merge, dfs)

(Old Answer - don't use)

You don’t have to create a variation each circle. Chain them:

 df= df5.merge(df4[['code', 'name']],
            left_on='provinceCode', 
            right_on='code', 
            how='left'
            ).merge(df3[['code', 'name']], 
            left_on='areaCode', 
            right_on='code', 
            how = 'left'
            ).merge(df2[['code', 'name']], 
            left_on='areaCode',
            right_on='code',
            how ='left'
            ).merge(df1[['provinceCode', 'provinceName']],
            left_on='provinceCode',
            right_on='code',
            how='left')

Update: Reduce is awesome in simplicity and speed, but for readability, it is less readable compared to mergers: We could DRY the code:

common_joins = dict(right_on='code', how='left')
common_columns = ['code', 'name']

df= df5.merge(df4[common_columns],
            left_on='provinceCode', 
            **common_joins
            ).merge(df3[common_columns], 
            left_on='areaCode', 
            **common_joins
            ).merge(df2[common_columns], 
            left_on='areaCode',
            **common_joins
            ).merge(df1[['provinceCode', 'provinceName']],
            left_on='provinceCode',
            **common_joins)
Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57