1

I implemented to merge multiple dataframe referring to this page.
What I want to do is also specify the suffix for each dataframe like below.
However, I get ValueError: too many values to unpack (expected 2).
I understand that giving the tuple longer than 2 for suffix is causing this problem. But I have no idea how I can write code to fix this problem.
Can anyone tell me how to write?

def agg_df(dfList, suffix):
    temp=reduce(lambda left, right: pd.merge(left, right, left_index=True, right_index=True, 
                                             how='outer', suffixes=suffix), dfList)
    return temp

df=agg_df([df_cool, df_light, df_sp, df_hvac], ('_chiller', '_light', '_sp', '_hvac'))
James
  • 32,991
  • 4
  • 47
  • 70
Katsuya Obara
  • 903
  • 3
  • 14
  • 30

2 Answers2

9

You can add the suffixes before merge, with add_suffix:

dfs = {0: df_cool, 1: df_light, 2: df_sp, 3: df_hvac}
suffix = ('_chiller', '_light', '_sp', '_hvac')
for i in dfs:
    dfs[i] = dfs[i].add_suffix(suffix[i])

Then remove the suffixes argument from merge and you're done:

def agg_df(dfList):
    temp=reduce(lambda left, right: pd.merge(left, right, 
                                             left_index=True, right_index=True, 
                                             how='outer'), dfList)
    return temp

df = agg_df(dfs.values())
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
0

Merge is pyspark doesnt have suffix option but you could do it using koalas

    import databricks.koalas as ks

    left_kdf = ks.DataFrame(hist_sls_cy)
    right_kdf = ks.DataFrame(hist_sls_ly)
    kdf_cmbnd = left_kdf.merge(right_kdf,on=['x1','x2'],how='left',suffixes=('','_last'))
Arjun
  • 1
  • 1