-1

I'm following the answer from this question

I have a df like this:

score_1   score_2  
1.11        NaN      
2.22        3.33      
NaN         3.33      
NaN         NaN
........       

The rule for calculating final_score is that we require at least one of the scores to be non-null, if one of the scores in NULL, then final_score will equal to another score (it has all the weights) This is the code to replicate:

import numpy as np
import pandas as pd

df = pd.DataFrame({
            'score_1': [1.11, 2.22, np.nan],
            'score_2': [np.nan, 3.33, 3.33]
        })

def final_score(df):
    if (df['score_1'] != np.nan) and (df['score_2'] != np.nan):
        print('I am condition one')
        return df['score_1'] * 0.2 + df['score_2'] * 0.8

    elif (df['score_1'] == np.nan) and (df['score_2'] != np.nan):
        print('I am the condition two')
        return df['score_2']

    elif (df['score_1'] != np.nan) and (df['score_2'] == np.nan):
        print('I am the condition three')
        return df['score_1']

    elif (df['score_1'] == np.nan) and (df['score_2'] == np.nan):
        print('I am the condition four')
        return np.nan

df['final_score'] = df.apply(final_score, axis=1)
print(df)

This gave me output:

score_1   score_2  final_score
1.11        NaN       NaN
2.22        3.33      3.108
NaN         3.33      NaN
NaN         NaN       NaN
........ 

But my expected output is below:

score_1   score_2  final_score
1.11        NaN       1.11
2.22        3.33      3.108
NaN         3.33      3.33
NaN         NaN       NaN
........ 

The first and third row are not the result I'm expecting, can someone help me, what's wrong with my code? Thanks a lot.

wawawa
  • 2,835
  • 6
  • 44
  • 105

2 Answers2

3

Lets appy your conditions using np.where

df['final_score'] =np.where(df.notna().all(1),df['score_1'] * 0.2 + df['score_2'] * 0.8,df.mean(1))



   score_1  score_2  final_score
0     1.11      NaN        1.110
1     2.22     3.33        3.108
2      NaN     3.33        3.330
3      NaN      NaN          NaN
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Hi thanks, this is much more simplified, but not very readable for people not familiar with what I'm doing, just wondering why we use `df.mean(1)` here? – wawawa Nov 24 '21 at 14:18
  • df.mean(1) along the axis 1 that is mean along the rows. np.where(condition, apply if condition is, apply this if not condition is met) – wwnde Nov 24 '21 at 14:26
  • Hi if I still want to use my original code, how can I update it to make it work? Because your one-line code seems not working for my 'large' dataset when there're lots of other columns, so I still want to figure out what's the issue in my original code, thanks. – wawawa Nov 24 '21 at 14:28
  • np.where is vectorized and much faster, unless speed and compute resources are a non issue – wwnde Nov 24 '21 at 14:31
  • but problem is I have other columns with different values, and this one-line code doesn't work for me, I can't just use `df.mean(1)` – wawawa Nov 24 '21 at 14:32
  • 1
    df['final_score'] =np.where(df[['score_1','score_2']].notna().all(1),df['score_1'] * 0.2 + df['score_2'] * 0.8,df.mean(1)). Try that and let me know. Typing on phone so didn't test. Basically subsetting the two columns to check if all of them are NaN – wwnde Nov 24 '21 at 14:37
  • `df['final_score'] =np.where(df[['score_1','score_2']].notna().all(1),df['score_1'] * 0.2 + df['score_2'] * 0.8,df[['score_1','score_2']].mean(1))` this works! Thanks! – wawawa Nov 24 '21 at 14:40
0

using np.isnan() for comparison should solve the problem

JEFFRIN JACOB
  • 257
  • 1
  • 6
  • Hi can you be more specific? I tried `elif df['score_1'].isnan() and df['score_2'].notnan():` but not working – wawawa Nov 24 '21 at 14:24
  • 1
    what i meant was to use np.isnan() in the following way: `elif (np.isnan(df['score_1']) and not np.isnan(df['score_2'])):` – JEFFRIN JACOB Nov 25 '21 at 03:18