0

Have two dataframes, one contains ground_truth for cities, another one is read from other files randomly.

  ground_truth = pd.DataFrame(['New York','Denvor','Cleveland'],columns = ['cities'])
  random_df =  pd.DataFrame(['DenvoR','cleveland'],columns = ['cities'])

Need to compare two dataframes, compare random_df cities column with ground_truth cities column, change to the ground_truth cities if cases are messed up. So far I used for loop, it works but not elegant. Any suggestion?

newleaf
  • 2,257
  • 8
  • 32
  • 52

1 Answers1

0

Check with

s1=df.cities.str.upper()
s2=random_df.cities.str.upper()
df.loc[s1.isin(s2),'cities']=s1.map(dict(zip(s2,random_df.cities)))
df
      cities
0   New York
1     DenvoR
2  cleveland
BENY
  • 317,841
  • 20
  • 164
  • 234