11

I have 2 Pandas dfs, A and B. Both have 10 columns and the index 'ID'. Where the IDs of A and B match, I want to replace the rows of B with the rows of A. I have tried to use pd.update, but no success yet. Any help appreciated.

Chris Parry
  • 2,937
  • 7
  • 30
  • 71
  • 3
    try `A.combine_first(B)` should work.See [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – shivsn Sep 01 '16 at 09:13
  • If you update rows of B with rows of A the same result will come? not clear can you provide an example of expected output – Marlon Abeykoon Sep 01 '16 at 09:21
  • Marlon: No, A and B contain some matching IDs, but the contents of the other columns is different. – Chris Parry Sep 01 '16 at 09:44

3 Answers3

14

below code should do the trick

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])



df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print df1

output

   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

Edit from comments:

To replace the whole row instead of only some columns:

cols = list(df1.columns) 
df1.loc[df1.A.isin(df2.A), cols] = df2[cols]
Borealis
  • 8,044
  • 17
  • 64
  • 112
Shijo
  • 9,313
  • 3
  • 19
  • 31
  • 1
    Old question but what if I want to replace the whole row instead of some columns? – Rafael Almeida Jul 07 '17 at 09:02
  • @ Rafael Almeida I assume that you have same set columns in both dataframes; Try this; cols = list(df1.columns) df1.loc[df1.A.isin(df2.A), cols] = df2[cols] – Shijo Jul 07 '17 at 15:44
  • the last line `df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]` produce `NaN` for me. I need to ad `.values` in the assigned values. It becomes `df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']].values` – mahendri Jun 09 '22 at 02:39
2

You can empty your target cells in A (by setting them to NaN) and use the combine_first() method to fill those with B's values. Although it may sound counter-intuitive, this approach gives you the flexibility to both target rows and specific columns in 2 lines of code. Hope that helps.

An example replacing the full row's that have an index match:

# set-up
cols = ['c1','c2','c3']
A = pd.DataFrame(np.arange(9).reshape((3,3)), columns=cols)
B = pd.DataFrame(np.arange(10,16).reshape((2,3)), columns=cols)

#solution
A.loc[B.index] = np.nan
A = A.combine_first(B)

An example of only replacing certain target columns for row's that have an index match:

A.loc[B.index, ['c2','c3']] = np.nan
A = A.combine_first(B)
fpersyn
  • 1,045
  • 1
  • 12
  • 19
  • Regarding the `SettingWithCopyWarning` resulting from using [`.loc()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html). It is my understanding that you can safely ignore it when you intend to overwrite the original DataFrame *A*. Not entirely sure if that is the case for this question. See [this thread](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas#answer-20627316) for more info. – fpersyn Dec 02 '19 at 15:59
0

enter image description here Above code works based on the index value. If we have different row counts for both data frame. It will not work. For that, we need to set a specific column to index