We have a dataframe:
data = [['A1', 'B1'], ['A2', 'B2', 1, 2], ['A3', 'B3', 3, 4], ['A4', 'B4']]
df = pd.DataFrame(data, columns=['A','B','C','D'])
which looks like this
A | B | C | D
-------------------
A1 | B1 | NaN | NaN
A2 | B2 | 1 | 2
A3 | B3 | 3 | 4
A4 | B4 | Nan | NaN
-------------------
for columns C
and D
, they will both be populated or both be NaN
(there wont be a scenario where C
is NaN
and D
will have values or vice versa)
My goal is to transform the dataframe to look like this:
A | B | C | D
-------------------
A1 | B1 | NaN | NaN
1 | 2 | 1 | 2
3 | 4 | 3 | 4
A4 | B4 | Nan | NaN
-------------------
I have tried
df.loc[df['C'].notna(), ['A', 'B']] = df.loc[df['C'].notna(), ['C', 'D']]
# the above just assigns back NaN values instead of 1,2,3,4
and
m = df['C'].notna()
df[['A', 'B']] = np.where(m, df[['C', 'D']], df[['A', 'B']])
# the above errors with operands could not be broadcast together with shapes (4,) (4,2) (4,2)
df[['X', 'Y']] = pd.DataFrame(np.where(m, df[['C', 'D']]), df[['A', 'B']])
# the above errors with ValueError: either both or neither of X and Y should be given
I have looked into this question here, and tried some ways to convert df[['C', 'D']]
as lists and assign it back as a new dataframe but I have still not been able to make it work.
I know I can assign the columns (A-C, B-D) individually but I am dealing with a large number of such pairs and want to avoid looping through them. Is there a clean way to do this operation in a single pass?
using pandas version 0.25.3.
Thanks for the help!