update multiple columns of pandas dataframe using np.where or loc

Question

We have a dataframe:

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

which looks like this

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
A2 | B2 | 1   | 2
A3 | B3 | 3   | 4
A4 | B4 | Nan | NaN
-------------------

for columns C and D, they will both be populated or both be NaN (there wont be a scenario where C is NaN and D will have values or vice versa)

My goal is to transform the dataframe to look like this:

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
1  | 2  | 1   | 2
3  | 4  | 3   | 4
A4 | B4 | Nan | NaN
-------------------

I have tried

df.loc[df['C'].notna(), ['A', 'B']] = df.loc[df['C'].notna(), ['C', 'D']]
# the above just assigns back NaN values instead of 1,2,3,4

and

m = df['C'].notna()
df[['A', 'B']] = np.where(m, df[['C', 'D']], df[['A', 'B']])
# the above errors with operands could not be broadcast together with shapes (4,) (4,2) (4,2) 
df[['X', 'Y']] = pd.DataFrame(np.where(m, df[['C', 'D']]), df[['A', 'B']])
# the above errors with ValueError: either both or neither of X and Y should be given

I have looked into this question here, and tried some ways to convert df[['C', 'D']] as lists and assign it back as a new dataframe but I have still not been able to make it work.

I know I can assign the columns (A-C, B-D) individually but I am dealing with a large number of such pairs and want to avoid looping through them. Is there a clean way to do this operation in a single pass?

using pandas version 0.25.3.

Thanks for the help!

Grzegorz Skibinski · Answer 1 · 2020-01-12T09:11:39.053

5

Using pandas.loc[...]:

df.loc[~df['C'].isna(), 'A']=df.loc[~df['C'].isna(), 'C']
df.loc[~df['D'].isna(), 'B']=df.loc[~df['D'].isna(), 'D']

Using np.where(...):

import numpy as np

df[['A', 'B']]=np.where(df['C'].notna().to_numpy().reshape(-1,1), df[['C', 'D']], df[['A', 'B']])

Output:

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN

edited Jan 12 '20 at 09:11

answered Jan 12 '20 at 09:05

Grzegorz Skibinski

12,624
2
11
34

FBruzzesi · Accepted Answer · 2020-01-12T09:08:57.243

3

Try retrieving values to assign, namely:

import pandas as pd

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

df.loc[df['C'].notna(), ['A','B']] = df.loc[df['C'].notna(), ['C','D']].to_numpy()

df

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN

edited Jan 12 '20 at 09:08

answered Jan 12 '20 at 09:00

FBruzzesi

6,385
3
15
37

update multiple columns of pandas dataframe using np.where or loc

2 Answers2