1

I have two data from.

df1 with columns: id,x1,x2,x3,x4,....xn

df2 with columns: id,y.

df3 =pd.concat([df1,df2],axis=1)

when I use pandas concat to combine them, it became

id,y,id,x1,x2,x3...xn.

there are two id here.How can I get rid of one.

I have tried :

df3=pd.concat([df1,df2],axis=1).drop_duplicates().reset_index(drop=True).

but not work.

Zoe
  • 1,402
  • 1
  • 12
  • 21
Yuuu
  • 63
  • 6

2 Answers2

1

DataFrames are concatenated on the index. Make sure that id is the index before concatenating:

df3 = pd.concat([df1.set_index('id'), 
                 df2.set_index('id')], axis=1).reset_index()

Or, better yet, use join:

df3 = df1.join(df2, on='id')
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Thank you so much. The first one works. The second shows "You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat" – Yuuu Sep 23 '19 at 22:27
  • Apparently your `id` columns in the two dataframes are of different data types: one is strings, the other is numbers. They may look the same, but they are not. Convert them to the same datatype with `.astype()`. – DYZ Sep 23 '19 at 22:28
0

drop_duplicates() only removes rows that are completely identical.

what you're looking for is pd.merge().

pd.merge(df1, df2, on='id)

Zoe
  • 1,402
  • 1
  • 12
  • 21