1

This is a followup from this question.

I am trying to merge two dataframe on a common column , but one dataframe has multiple columns with same name. I want to use that column to merge.

These are my dataframes:

df1 = pd.DataFrame([['abc', 'xyz'], ['abc', 'xyz'], ['xyz', 'abc']], columns=['max_speed', 'min_speed'])
df2 = pd.DataFrame([['abc', 'xyz'], ['abc', 'xyz'], ['xyz', 'abc']], columns=['max_speed', 'max_speed'])

I have a special case where my dataframe has multiple columns with same name(reference).

I tried using the location of the column rather than name of the column to access but didnt work

What I tried:

df3 = df1.merge(df2, right_on=df1.columns[0],left_on=df2.columns[0])

Error:

ValueError: The column label 'max_speed' is not unique.

Let me know how can I solve this.

Murtaza Haji
  • 1,093
  • 1
  • 13
  • 32

1 Answers1

1

you can set_index with the column in the position you want in df2, then use merge with right_index=True.

print (df1.merge(df2.set_index(df2.iloc[:,0])
                    .iloc[:, 1:], #to select only the other column max_speed
                 left_on='max_speed', right_index=True, 
                 suffixes=('','_'))
      )
  max_speed min_speed max_speed_
0       abc       xyz        xyz
0       abc       xyz        xyz
1       abc       xyz        xyz
1       abc       xyz        xyz
2       xyz       abc        abc
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • Can I keep only the first `max_speed` column? Basically I have over 10-15 max_speed columns in the other df. So it would be good to have only one max_speed at the end. – Murtaza Haji Jul 14 '20 at 00:11
  • @MurtazaHaji not sure to understand what you want. the fact there are already 2 column with same name is not intuitive. the way I understand is either in the iloc it could be `.iloc[:, 1:2]` to keep only one other column max_speed from df2, or just select the first N columns, where N is the number of column from df1, on the result of the merge – Ben.T Jul 14 '20 at 00:20