3

We have a dataframe:

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

which looks like this

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
A2 | B2 | 1   | 2
A3 | B3 | 3   | 4
A4 | B4 | Nan | NaN
-------------------

for columns C and D, they will both be populated or both be NaN (there wont be a scenario where C is NaN and D will have values or vice versa)

My goal is to transform the dataframe to look like this:

A  | B  | C   | D
-------------------
A1 | B1 | NaN | NaN
1  | 2  | 1   | 2
3  | 4  | 3   | 4
A4 | B4 | Nan | NaN
-------------------

I have tried

df.loc[df['C'].notna(), ['A', 'B']] = df.loc[df['C'].notna(), ['C', 'D']]
# the above just assigns back NaN values instead of 1,2,3,4 

and

m = df['C'].notna()
df[['A', 'B']] = np.where(m, df[['C', 'D']], df[['A', 'B']])
# the above errors with operands could not be broadcast together with shapes (4,) (4,2) (4,2) 
df[['X', 'Y']] = pd.DataFrame(np.where(m, df[['C', 'D']]), df[['A', 'B']])
# the above errors with ValueError: either both or neither of X and Y should be given

I have looked into this question here, and tried some ways to convert df[['C', 'D']] as lists and assign it back as a new dataframe but I have still not been able to make it work.

I know I can assign the columns (A-C, B-D) individually but I am dealing with a large number of such pairs and want to avoid looping through them. Is there a clean way to do this operation in a single pass?

using pandas version 0.25.3.

Thanks for the help!

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
kpython
  • 363
  • 2
  • 8
  • 21

2 Answers2

5

Using pandas.loc[...]:

df.loc[~df['C'].isna(), 'A']=df.loc[~df['C'].isna(), 'C']
df.loc[~df['D'].isna(), 'B']=df.loc[~df['D'].isna(), 'D']

Using np.where(...):

import numpy as np

df[['A', 'B']]=np.where(df['C'].notna().to_numpy().reshape(-1,1), df[['C', 'D']], df[['A', 'B']])

Output:

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
3

Try retrieving values to assign, namely:

import pandas as pd

data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])

df.loc[df['C'].notna(), ['A','B']] = df.loc[df['C'].notna(), ['C','D']].to_numpy()

df

    A   B    C    D
0  A1  B1  NaN  NaN
1   1   2  1.0  2.0
2   3   4  3.0  4.0
3  A4  B4  NaN  NaN
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37