how to remove duplicates when using pandas concat to combine two dataframe

Question

I have two data from.

df1 with columns: id,x1,x2,x3,x4,....xn

df2 with columns: id,y.

df3 =pd.concat([df1,df2],axis=1)

when I use pandas concat to combine them, it became

id,y,id,x1,x2,x3...xn.

there are two id here.How can I get rid of one.

I have tried :

df3=pd.concat([df1,df2],axis=1).drop_duplicates().reset_index(drop=True).

but not work.

score 1 · Accepted Answer · answered Sep 23 '19 at 22:23

1

DataFrames are concatenated on the index. Make sure that id is the index before concatenating:

df3 = pd.concat([df1.set_index('id'), 
                 df2.set_index('id')], axis=1).reset_index()

Or, better yet, use join:

df3 = df1.join(df2, on='id')

answered Sep 23 '19 at 22:23

DYZ

Thank you so much. The first one works. The second shows "You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat" – Yuuu Sep 23 '19 at 22:27
Apparently your `id` columns in the two dataframes are of different data types: one is strings, the other is numbers. They may look the same, but they are not. Convert them to the same datatype with `.astype()`. – DYZ Sep 23 '19 at 22:28

score 0 · Answer 2 · answered Sep 23 '19 at 22:24

0

drop_duplicates() only removes rows that are completely identical.

what you're looking for is pd.merge().

pd.merge(df1, df2, on='id)

answered Sep 23 '19 at 22:24

Zoe

2 Answers2