0

I have a dataset 'df' like so:

a     b     c    e  f  g
1     Nan   Nan  2  3  4
Nan   1     Nan  2  3  4
Nan   Nan   1    2  3  4

I want to create a new column 'd' in my dataframe 'df' like so:

   a     b     c    d  e  f  g 
   1     Nan   Nan  1  2  3  4
   Nan   1     Nan  1  2  3  4
   Nan   Nan   1    1  2  3  4

So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'

I created this function below:

def func(df):
    if df['a'] is not None:
        return df['a']
    if df['b'] is not None:
        return df['b']
    if df['c'] is not None:
        return df['c']

and then used concatenated_df['d']=concatenated_df.apply(func, axis=1) to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'

I also tried modifying the function to

def func(df):
    if df['a'] is not None:
        return df['a']
    elif df['b'] is not None:
        return df['b']
    elif df['c'] is not None:
        return df['c']

but it still only executed the first conditional statement

crayfishcray
  • 379
  • 4
  • 15

2 Answers2

2

You can use first_valid_index

df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)

     a    b    c  d
0    1  NaN  NaN  1
1  NaN    1  NaN  1
2  NaN  NaN    1  1
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • thanks for this answer - I updated my question to be more specific - there are columns in my dataset that I don't want to use for constructing column 'd' (I only want to use columns 'a', 'b', and 'c'), so I think the other answer is better for my purposes. Thanks for your help though! – crayfishcray Jul 09 '18 at 19:15
  • 1
    If you'd like to apply this to a subset, you can just index your dataframe first: `df[['a', 'b', 'c']].apply(lambda x: x[x.first_valid_index()], axis=1)` – user3483203 Jul 09 '18 at 21:05
1

One more option not mentioned in the duplicated question:

df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0    1.0
#1    1.0
#2    1.0
DYZ
  • 55,249
  • 10
  • 64
  • 93