0

Say I have two DataFrames, an "original" and "correction" dataframe such that

orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})

Name Code
A    9q8
B    7k2

corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A    9R8

How can I assign orig['Code']=corr['Code'] wherever orig['Name']==corr['Name] without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:

Name Code
A    9R8
B    7k2

I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.

For example, if I merge orig and corr, I get:

out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A    9q8
A    9R8
B    7k2

Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8 and A 9R8 was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2.

David
  • 606
  • 9
  • 19
  • 1
    `orig.update(corr)` – It_is_Chris Nov 19 '18 at 16:46
  • Thanks @Chris, I tried this and it works! But how? From the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html) there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when `.update()` will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to `.update()` that allows you to chose index/column – David Nov 19 '18 at 17:15
  • `update()` "Aligns on indices" and updates matching columns in the left dataframe with data from the right – It_is_Chris Nov 19 '18 at 17:41
  • I see, so `.update()` replaces the entries in the left table with those in the right wherever the indices match, `index_left==index_right`, but only for columns that are in both the left table and the right table, `colName_left == colName_right`. And, if I want, I can change the index I'm matching on to some other arbitrary column with `.set_index()`. – David Nov 20 '18 at 17:45
  • 1
    yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: `df1['col1'].update(df2['col2'])` this will update `df1['col1']` with the data from `df2['col2']` where the indices match – It_is_Chris Nov 20 '18 at 19:05

1 Answers1

3
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))

orig.reset_index(inplace=True)
orig
Out[164]: 
  Name Code
0    A  9R8
1    B  7k2
BENY
  • 317,841
  • 20
  • 164
  • 234