Python Pandas Conditional Only Executes First "If" Statement

Question

I have a dataset 'df' like so:

a     b     c    e  f  g
1     Nan   Nan  2  3  4
Nan   1     Nan  2  3  4
Nan   Nan   1    2  3  4

I want to create a new column 'd' in my dataframe 'df' like so:

   a     b     c    d  e  f  g 
   1     Nan   Nan  1  2  3  4
   Nan   1     Nan  1  2  3  4
   Nan   Nan   1    1  2  3  4

So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'

I created this function below:

def func(df):
    if df['a'] is not None:
        return df['a']
    if df['b'] is not None:
        return df['b']
    if df['c'] is not None:
        return df['c']

and then used concatenated_df['d']=concatenated_df.apply(func, axis=1) to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'

I also tried modifying the function to

def func(df):
    if df['a'] is not None:
        return df['a']
    elif df['b'] is not None:
        return df['b']
    elif df['c'] is not None:
        return df['c']

but it still only executed the first conditional statement

So you just want the first value that isn't `None` from a b and c? — user3483203, Jul 09 '18 at 18:51
I will update the question with an example to make it more clear — crayfishcray, Jul 09 '18 at 18:51
Are you looking for not-None or not-NaN? (They are not the same.) — DYZ, Jul 09 '18 at 18:51
Possible duplicate of https://stackoverflow.com/questions/31828240/first-non-null-value-per-row-from-a-list-of-pandas-columns/31828559 — DYZ, Jul 09 '18 at 19:02

score 2 · Accepted Answer · answered Jul 09 '18 at 19:01

2

You can use first_valid_index

df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)

     a    b    c  d
0    1  NaN  NaN  1
1  NaN    1  NaN  1
2  NaN  NaN    1  1

answered Jul 09 '18 at 19:01

user3483203

50,081
9
65
94

thanks for this answer - I updated my question to be more specific - there are columns in my dataset that I don't want to use for constructing column 'd' (I only want to use columns 'a', 'b', and 'c'), so I think the other answer is better for my purposes. Thanks for your help though! – crayfishcray Jul 09 '18 at 19:15
1

If you'd like to apply this to a subset, you can just index your dataframe first: `df[['a', 'b', 'c']].apply(lambda x: x[x.first_valid_index()], axis=1)` – user3483203 Jul 09 '18 at 21:05

score 1 · Answer 2 · answered Jul 09 '18 at 19:04

1

One more option not mentioned in the duplicated question:

df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0    1.0
#1    1.0
#2    1.0

answered Jul 09 '18 at 19:04

DYZ

55,249
10
64
93

Python Pandas Conditional Only Executes First "If" Statement

2 Answers2