0

Given df1 and df2 dataframes, I would like to substitute the 'Name' items in df2 for their corresponding 'Area' values in df1.

In other words, the values in df2['Name] should be respectively A1 A3 A2 A1 A2

import pandas as pd

df1 = pd.DataFrame({'Name':['Tom', 'nick', 'Albert', 'Josh'],
            'Area':['A1','A2','A3','A1']
                       })
df2 = pd.DataFrame({'Name':['Tom', 'Albert', 'Nick', 'Josh','Nick'],
            'pet':['dog','cat','mouse','dog','mouse']
                       })

After searching for some suggestions, I added this line:

df = df2.merge(df1, on=['Area','Name'], how='left')

print(df)

But I get this error:

# Check for duplicates

KeyError: 'Area'

and this line:

df = df2['Name'].map(df1.set_index('Area')['Name'])
print(df)

outputs this error

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
john johns
  • 167
  • 7

3 Answers3

0

That's because both dataframes should have a column named Area.

So remove the Area when merging.

df = df2.merge(df1, on=['Name'], how='left')
ql.user2511
  • 369
  • 2
  • 12
0

You do not have any column named 'Area' in your second dataframe (df2). This is why the error is being shown.

Kazi
  • 381
  • 2
  • 13
  • unfortunately none of the solutions posted here work. By printing df I only get the 'Name' column filled in with NaN values – john johns Sep 28 '21 at 12:52
  • Could you please state your objective more clearly? What exactly are you trying to do here? – Kazi Sep 28 '21 at 12:53
  • As I wrote above, the objective is that the values df2['Name] should be respectively A1 A3 A2 A1 A2. That is because when df1['Name'] = 'Tom', df1['Area]='A1', when df1['Name'] = 'Nick', df1['Area']=A2' etc... clear now? – john johns Sep 28 '21 at 12:58
0

You can use dict and zip commands to make the 2 dataframe columns you want to map into a dictionary. Then you can map the values easily.

import pandas as pd

df1 = pd.DataFrame({
     'Name':['Tom', 'nick', 'Albert', 'Josh'],
     'Area':['A1','A2','A3','A1']
})

df2 = pd.DataFrame({
    'Name':['Tom', 'Albert', 'Nick', 'Josh','Nick'],
    'pet':['dog','cat','mouse','dog','mouse']
})

df2['Name'] = df2['Name'].map(dict(zip(df1['Name'], df1['Area'])))
dgrealy
  • 21
  • 2
  • this code almost produces the right output. The new 3rd and 5th row are 'NaN' when they should both be 'A2' – john johns Sep 28 '21 at 13:34
  • That is because in df1, the name 'nick' isn't capitalized. It is capitalized in df2. Hope that solves it :) – dgrealy Sep 28 '21 at 13:36