-1

Assuming I have a dataframe looking like below:

import pandas as pd
import numpy as np
d = {'Column 1': [10, 12,13,43,np.nan], 
    'Column2':[np.nan,7,np.nan,49,8]}
df = pd.DataFrame(d)

enter image description here

I would like to create a third column with a condition to take values from Column 2 unless they are NaNs. So looking like below:

enter image description here

I have found multiple topics/solutions where the condition was dependent on values in one column but could not find one where it had to provide data from more than one column.

kawuel
  • 80
  • 7
  • Not sure what the "multiple topics/solutions" you found but this is a duplicate of https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column. –  Jan 26 '22 at 15:18
  • `df['col3'] = df['col2'].fillna(df['col1'])` – ansev Jan 26 '22 at 15:39

2 Answers2

0

You could use mask:

df['Column3'] = df['Column2'].mask(df['Column2'].isna(), df['Column 1'])

A more generic version (uses any number of columns) would be to take the last valid value per row:

df['Column3'] = df.ffill(1).iloc[:,-1]

output:

   Column 1  Column2  Column3
0      10.0      NaN     10.0
1      12.0      7.0      7.0
2      13.0      NaN     13.0
3      43.0     49.0     49.0
4       NaN      8.0      8.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • I think check NaN is not neccesary – ansev Jan 26 '22 at 15:45
  • 1
    @ansev there are many ways, I mostly provided an answer for the second option that I found more interesting (but which unfortunately did not seem to be used) – mozway Jan 26 '22 at 15:49
0

You only need:

df['Column3'] = df['Column2'].fillna(df['Column1'])

Or:

df['Column3'] = df['Column2'].combine_first(df['Column1'])
ansev
  • 30,322
  • 5
  • 17
  • 31