0

I have two dataframes

df1=

name Code
1234567 H74

df2=

name Code
1234567 J11
1234321 J12

I want to change 'Code' of df2 to the value that is in 'Code' of df1 if 'name' in df2 matches 'name' in df1.

I have tried creating a dictionary of df1;

dict = df1.set_index('name').to_dict()['Code']
df2.replace(dict)

However that results in changing 'name' to 'Code' like:

name Code
H74 J11
1234321 J12

Any help in achieving the following would be greatly appreciated:

name Code
1234567 H74
1234321 J12

Thanks in advance!

Corralien
  • 109,409
  • 8
  • 28
  • 52
jnord
  • 55
  • 6
  • 1
    Quick approach would be to do a left merge, with df2 on the left, then use `pandas.where` to replace the df2 'Code' values with the code from df1 where there was a match. – whege Oct 13 '21 at 21:50

3 Answers3

1

Use pd.concat and drop_duplicates:

>>> pd.concat([df1, df2]).drop_duplicates('name')

      name Code
0  1234567  H74
1  1234321  J12
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

Try these -

Using pd.DataFrame.where

  1. ~df2['name'].isin(df1['name']) checks for the names that exist in df2 but not in df1
  2. The pd.where keeps the current code values as is, if above is true, else for the false, it picks up df1['Code']
  3. This can now be use to replace the df1['Code'] column.

Read documentation.

idx = ~df2['name'].isin(df1['name'])
df2['Code'] = df2['Code'].where(idx,df1['Code'],axis=0)
      name Code
0  1234567  H74
1  1234321  J12

Using pd.DataFrame.join

  1. You can set the name column as index for both df1 and df2, and join (left) on df2.
  2. Next, you can use ffill(axis=1) to fill the df2 code values to the nans (where df1 doesn't exist)
  3. Finally you can fetch just the Code column, which is actually the df1's code column filled with df2 code values where it was nan.
  4. Reset the index to get back what you need.
  5. Remember to save it in a variable!

Read documentation.

df2.set_index('name').join(df1.set_index('name'), lsuffix='_')\
                     .ffill(1)['Code']\
                     .reset_index()
      name Code
0  1234567  H74
1  1234321  J12

Using pd.DataFrame.update

This is simple, but an in-place operation. So be careful with it!

Read documentation.

df2.update(df1)
df2
        name Code
0  1234567.0  H74
1  1234321.0  J12
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Thanks for your response! When Trying the first suggestion it get an the error 'cannot reindex from a duplicate axis' – jnord Oct 13 '21 at 22:37
  • could you tell me the error? the approach works for the same dataset that you have provided. Also, do let me know if other approaches worked for you. – Akshay Sehgal Oct 13 '21 at 22:37
  • The error is ValueError: cannot reindex from a duplicate axis. The dataset that I am working with contains more columns in df2 than I provided. idx = ~df2['name'].isin(df1['name']) seems to work correctly because there is only one value that is similar between the two dataframes – jnord Oct 13 '21 at 22:39
  • can you try to update your input dataset in the above question, where the error is occuring? because with your current data example, this would not happen. – Akshay Sehgal Oct 13 '21 at 22:41
  • The second code line is only updating the `Code` column. `~df2['name'].isin(df1['name'])` work on just the `name` column and returns a boolean. Is this running properly? – Akshay Sehgal Oct 13 '21 at 22:42
  • I continue to get the ValueError – jnord Oct 13 '21 at 22:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/238116/discussion-between-akshay-sehgal-and-jnord). – Akshay Sehgal Oct 13 '21 at 22:47
0

This can be done by setting the 'name' column as the index and using the update function.

import pandas as pd

df1 = pd.DataFrame({'Name':'Ashutosh', 'Code':'AP', 'Number':'1234567', 'ID':'H74'}, index=[1]).set_index('Name')

df2 = pd.DataFrame({'Name':'Ashutosh', 'Code':'JN', 'Number':'1234567', 'ID':'J11', 'Alternate':'1234321', 'SID':'J12'}, index=[1]).set_index('Name')

So my dataframes would look like this:

>>> df1

          Code  Number  ID
Name            
Ashutosh    AP  1234567 H74

>>> df2


          Code  Number  ID  Alternate SID
Name                    
Ashutosh    JN  1234567 J11 1234321   J12

Using the update function:

>>> df2['Code'].update(df1['Code'])
>>> df2


          Code  Number  ID  Alternate   SID
Name                    
Ashutosh    AP  1234567 J11 1234321     J12
porrrwal
  • 169
  • 1
  • 14