Replace rows in a Pandas df with rows from another df

Question

I have 2 Pandas dfs, A and B. Both have 10 columns and the index 'ID'. Where the IDs of A and B match, I want to replace the rows of B with the rows of A. I have tried to use pd.update, but no success yet. Any help appreciated.

try `A.combine_first(B)` should work.See [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — shivsn, Sep 01 '16 at 09:13
If you update rows of B with rows of A the same result will come? not clear can you provide an example of expected output — Marlon Abeykoon, Sep 01 '16 at 09:21
Marlon: No, A and B contain some matching IDs, but the contents of the other columns is different. — Chris Parry, Sep 01 '16 at 09:44

score 14 · Accepted Answer · edited Nov 25 '20 at 06:15

14

below code should do the trick

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])



df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print df1

output

   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

Edit from comments:

To replace the whole row instead of only some columns:

cols = list(df1.columns) 
df1.loc[df1.A.isin(df2.A), cols] = df2[cols]

edited Nov 25 '20 at 06:15

Borealis

8,044
17
64
112

answered Sep 01 '16 at 12:53

Shijo

9,313
3
19
31

1

Old question but what if I want to replace the whole row instead of some columns? – Rafael Almeida Jul 07 '17 at 09:02
@ Rafael Almeida I assume that you have same set columns in both dataframes; Try this; cols = list(df1.columns) df1.loc[df1.A.isin(df2.A), cols] = df2[cols] – Shijo Jul 07 '17 at 15:44
the last line `df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]` produce `NaN` for me. I need to ad `.values` in the assigned values. It becomes `df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']].values` – mahendri Jun 09 '22 at 02:39

fpersyn · Answer 2 · 2019-12-02T16:35:44.297

2

You can empty your target cells in A (by setting them to NaN) and use the combine_first() method to fill those with B's values. Although it may sound counter-intuitive, this approach gives you the flexibility to both target rows and specific columns in 2 lines of code. Hope that helps.

An example replacing the full row's that have an index match:

# set-up
cols = ['c1','c2','c3']
A = pd.DataFrame(np.arange(9).reshape((3,3)), columns=cols)
B = pd.DataFrame(np.arange(10,16).reshape((2,3)), columns=cols)

#solution
A.loc[B.index] = np.nan
A = A.combine_first(B)

An example of only replacing certain target columns for row's that have an index match:

A.loc[B.index, ['c2','c3']] = np.nan
A = A.combine_first(B)

edited Dec 02 '19 at 16:35

answered Sep 11 '19 at 18:05

fpersyn

1,045
1
12
19

Regarding the `SettingWithCopyWarning` resulting from using [`.loc()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html). It is my understanding that you can safely ignore it when you intend to overwrite the original DataFrame *A*. Not entirely sure if that is the case for this question. See [this thread](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas#answer-20627316) for more info. – fpersyn Dec 02 '19 at 15:59

score 0 · Answer 3 · answered Dec 08 '22 at 15:42

0

Above code works based on the index value. If we have different row counts for both data frame. It will not work. For that, we need to set a specific column to index

answered Dec 08 '22 at 15:42

Satish Khullar

91
1
2

Replace rows in a Pandas df with rows from another df

3 Answers3

Linked