1

I have 2 DataFrames. DF1 and DF2.

(Please note that DF2 has more entries than DF1)

What I want to do is add the nationality column to DF1 (The script should look up the name in DF1 and find the corresponding nationality in DF2).

I am currently using the below code

final_DF =df1.merge(df2[['PPS','Nationality']], on=['PPS'], how ='left')

Although the nationality column is being added to DF1 the code duplicating entries and also adding additional data from DF2 that I do not want.

Is there a method to get the nationality from DF2 while only keeping the DF1 data?

Thanks

DF1

DF1

DF2

DF2

OUPUT

enter image description here

PythonBeginner
  • 463
  • 4
  • 18

1 Answers1

0

2 points, you need to do.

  1. If there is any duplicated in the DF2

  2. You can define 'how' in the merge statement. so it will look like

    final_DF = DF1.merge(DF2, on=['Name'], how = 'left')

since you want to keep only to DF1 rows, 'left' should be the ideal option for you.

For more info refer this

Amit Gupta
  • 2,698
  • 4
  • 24
  • 37
  • great, if this helped then kindly accept the answer and upvote it. Cheers!! – Amit Gupta Jun 22 '21 at 07:21
  • @amitgrupta is there a way to specify the column you want? for example if there was additional columns in DF2 but I only wanted the nationality column, how would I achieve this? – PythonBeginner Jun 22 '21 at 08:18
  • yes, like you're already specifying in your code. for ex: `DF1[col1, col2]. merge(DF2[colx, coly, col1], on = col1, how = 'left')` or alternatively you can just drop the additional columns after the merge. – Amit Gupta Jun 22 '21 at 08:24
  • When I use the code I have on another dataframe I have which has multiple columns (I only need the nationality column) there are duplicate entries is there a reason why this might happen?, – PythonBeginner Jun 22 '21 at 08:27
  • In the merge columns you'll get all the columns from both dataframe if you don't specify. As I said, you can drop the additional columns after the merge using `df_new = df.drop([col to drop], axis =1)` or specify the columns while merging. – Amit Gupta Jun 22 '21 at 08:30
  • I know I'm able to drop columns, I just can't over come the duplication I am experiencing – PythonBeginner Jun 22 '21 at 08:31
  • can you paste a screenshot of that, may be I am able to understand it clearly – Amit Gupta Jun 22 '21 at 08:32
  • please see images of the DFs and the output above. Thanks – PythonBeginner Jun 22 '21 at 09:07
  • The merge is working perfectly. The issue is with that there are duplicates PPS values in DF2. for ex: There are two 11 in PPS column of DF2, which is getting mapped in the output for 11 PPS value in the DF1 – Amit Gupta Jun 22 '21 at 12:04