fill new column of pandas DataFrame based on if-else of other columns

Question

I have a situation where I want to create a new column in a Pandas DataFrame and populate it according to conditions involving 2 other columns. In this example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([['value1','value2'],['value',np.NaN],[np.NaN,np.NaN]]), columns=['col1','col2'])

I would like to create a new column, 'new col', which consists of 1) the value in 'col2' if it is not NaN else, 2) the value in 'col1' if it is not NaN else, 3) NaN

I am trying this function with .apply() but it is not returning the desired result

def singleval(row):
    if row['col2'] != np.NaN:
        val = row['col2']
    elif row['col1'] != np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval,axis=1)

i want the values in 'new col' to be ['value2', 'value', 'nan']

Erfan · Accepted Answer · 2019-05-13T23:40:12.537

Method 1 `fillna`

In this case, we can simply use fillna on col2 with values from col1:

df['new col'] = df['col2'].fillna(df['col1'])

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Method 2 `np.select`

If you have multiple conditions, use np.select which you pass a list of conditions and based on those conditions you pass it choices:

conditions = [
    df['col2'].notnull(),
    df['col1'].notnull(),
]

choices=[df['col2'], df['col1']]

df['new col'] = np.select(conditions, choices, default=np.NaN)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Note

Your dataframe wasn't correct with the NaN, use this one instead to test:

df = pd.DataFrame({'col1':['value1', 'value', np.NaN],
                   'col2':['value2', np.NaN, np.NaN]})

Edit: why was the function not working?

np.NaN == np.NaN will return False
while np.NaN is np.NaN will return True.

See this question for the explanation of this.

So to fix your function you have to use is not:

def singleval(row):
    if row['col2'] is not np.NaN:
        val = row['col2']
    elif row['col1'] is not np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval, axis=1)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

struggling to see why your df is different from my df...nevermind: looks like it has to do with np.array() — laszlopanaflex, May 13 '19 at 23:24
Not sure either, would be a good question on SO as well :). @laszlopanaflex — Erfan, May 13 '19 at 23:25
thank you the 2 solutions. is it possible to explain why my original approach didn't work? im not able to see where the if-elif-else approach breaks down... — laszlopanaflex, May 13 '19 at 23:26
Added explanation about your approach @laszlopanaflex, good question btw! — Erfan, May 13 '19 at 23:40

Quang Hoang · Answer 2 · 2019-05-14T01:34:07.100

0

Try this:

df['col3'] = df[['col1','col2']].stack().groupby(level=0).last()

output:

    col1    col2    col3
0   value1  value2  value2
1   value   nan     value
2   nan     nan     nan

edited May 14 '19 at 01:34

answered May 13 '19 at 23:18

Quang Hoang

146,074
10
56
74

score 0 · Answer 3 · answered May 14 '19 at 01:26

0

Use df.ffill on axis=1

df['new_col'] = df.ffill(1).col2

Out[1318]:
     col1    col2 new_col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

answered May 14 '19 at 01:26

Andy L.

24,909
4
17
29

fill new column of pandas DataFrame based on if-else of other columns

3 Answers3

Method 1 fillna

Method 2 np.select

Method 1 `fillna`

Method 2 `np.select`