0

I have two dataset with columns names as below.

DF1:

Record Type , FNAME , MNAME , LNAME , ID , etc.

DF2:

ID, PREFIX , FNAME , MNAME , LNAME , etc.

If ID field in DF2 found a match in DF1, then copy matching values in 3 columns(FNAME , MNAME , LNAME) from DF2 to DF1.

thank you very much.

Alani
  • 73
  • 6
  • Welcome to Stack Overflow! Please include a small subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 17 '21 at 20:01
  • Additionally, please go through the intro [tour](https://stackoverflow.com/tour), and review [how to ask a good question](https://stackoverflow.com/help/how-to-ask) to understand the community expectations and to help you improve your current and future questions. – Henry Ecker May 17 '21 at 20:01

1 Answers1

0

You can do that using pd.merge():

new_df = pd.merge(DF1, DF2[['ID', 'FNAME' , 'MNAME' , 'LNAME']], on = 'ID', how = 'left')
Shivam Roy
  • 1,961
  • 3
  • 10
  • 23
  • 1
    Thank you so much, that works fine, however, is it a recommended way to to merge in case we have a production data with several millions of rows ? – Alani May 17 '21 at 21:40
  • Yes, you’re actually doing a left join, panda’s merge function is an efficient way to do it. And if you found the answer helpful, please consider upvoting and accepting it, following the Stack Overflow culture, thanks :) – Shivam Roy May 17 '21 at 21:46
  • If you still want to optimise your merge function, you can set the merging columns ‘ID’ in your case as index using set_index. The index column in a DataFrame is hashable and gives a better performance. You can even sort the index for more optimised performance – Shivam Roy May 17 '21 at 21:54
  • 1
    absolutely, I just did accept the answer but didnt allow me to upvote as I need 15 reputation point, is there a way I can follow you ? this is my first interaction in this community honestly. – Alani May 17 '21 at 22:27
  • no problem mate, thanks :) – Shivam Roy May 17 '21 at 22:28
  • Shivam , could you please tell me why on the new_df I get extra columns ['ID_x', 'FNAME_x' , 'MNAME__x' , 'LNAME_x'] while it is not update the required original columns on DF1 ['ID', 'FNAME' , 'MNAME' , 'LNAME'] – Alani May 19 '21 at 06:18
  • So do you want to only retain the columns in DF2 ? It is giving the columns because DF1 and DF2 have the columns with the same name, if you want to only retain the columns in DF2 mapped to the IDs in DF1 then you will need to use the following code: new_df = pd.merge(DF1[[‘ID’]] , DF2[['ID', 'FNAME' , 'MNAME' , 'LNAME']], on = 'ID', how = 'left') . – Shivam Roy May 19 '21 at 06:59
  • sorry for complicating this, bear with me. I want to retain the columns on DF1 after matching with DF2 based on ID, – Alani May 19 '21 at 14:16
  • so the destination columns on the DF1 and the source columns from DF2 have all the same names. – Alani May 19 '21 at 14:29
  • Yes, since we are combining the DataFrames, we can’t have the same name columns, this is the reason we get ‘_x’ suffixes for columns with the same name. I’m actually still a bit confused, about which all columns from DF1 and DF2 do you wish to retain in the final DataFrame ? – Shivam Roy May 19 '21 at 14:54
  • my final dataframe should be DF1, so technically it should retain all columns from DF1 while updating ['FNAME' , 'MNAME' , 'LNAME'] from DF2 based on matching 'ID' in both dataframes. – Alani May 19 '21 at 15:15
  • So are there IDs in DF1 which have null values and you want to update those null values that match ID of DF2 ? Because if DF1 already has those column values we can only overwrite it – Shivam Roy May 19 '21 at 17:06
  • no there is no Null values, values in DF2 [FNAME , MNAME , LNAME] will be copied to DF1 [FNAME , MNAME , LNAME] that correspond to a matching value in ID, – Alani May 19 '21 at 19:15
  • Then this should work, it will map the IDs in DF2 to matching IDs in DF1: new_df = pd.merge(DF1[[‘ID’]] , DF2[['ID', 'FNAME' , 'MNAME' , 'LNAME']], on = 'ID', how = 'left') – Shivam Roy May 20 '21 at 07:17
  • it worked fine except the new_df contains only 4 columns [['ID', 'FNAME' , 'MNAME' , 'LNAME']], what if I want to keep all columns of DF1 ??? DF1 has 91 columns, I want to maintain all columns while updating these columns [['ID', 'FNAME' , 'MNAME' , 'LNAME']] ? – Alani May 21 '21 at 05:43
  • Ohh now I get your problem, you had more rows other than thode three which you wanted to retain, try this code then: new_df = pd.merge(DF1.drop(['FNAME' , 'MNAME' , 'LNAME'], axis=1) , DF2[['ID', 'FNAME' , 'MNAME' , 'LNAME']], on = 'ID', how = 'left') – Shivam Roy May 21 '21 at 07:41
  • sorry things been evolved for me, let me give you a better insight. DF1 shape (33,92), DF2 shape(11,18). I want the new_df to have (33,92). while copying 18 columns from DF2 with identical 'ID' value. SIDE NOTE : 18 columns in DF2 has the same names with in 91 columns in DF1. – Alani May 21 '21 at 15:32
  • Create a list of all the 18 columns you want to replace: `cols_to_replace = ['FNAME' , 'MNAME' , 'LNAME'....]` and then use the code: `new_df = pd.merge(DF1.drop(cols_to_replace, axis=1) , DF2[['ID'] + cols_to_replace], on = 'ID', how = 'left')` . An efficient way to create the list without having to type it manually would be print `df.columns` which will give you the list of the columns, you can then copy paste it. Just be cautious, this will also have the column `ID` and you want want to include it in the list as it is required. I really hope this works for you. – Shivam Roy May 21 '21 at 19:02