1

I have a DataFrame with some null values that I want to substitute with mean values that I have in other DataFrame. I've created a function that it should later be implemented with a lambda but I keep getting an error.

I Have a DataFrame like this:

CustomerType Category Satisfaction Age
Not Premium Electronics Not Satisfied NaN
Not Premium Beauty Satisfied NaN
Premium Sports Satisfied 38.0
Not Premium Sports Not Satisfied NaN

That i need to fill with this data:

CustomerType Satisfaction Age
Not Premium Not Satisfied 32.440740
Not Premium Satisfied 28.896348
Premium Not Satisfied 43.767723
Premium Satisfied 44.075901

So I've created a function:

def fill_age(x):
if x.isnull()== True:
    return[(grp.CustomerType==x.CustomerType) | (grp.Satisfaction==x.Satisfaction)]['Age'].values[0]

That I would like to apply to my dataframe using a lambda function to iterate through all the rows:

df['Age'] = [df.apply(lambda x: fill_age(x) if np.isnan(x['Age']) else 
                                            x['Age'], axis=1) for x in df]

But i keep getting this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can anyone of you help me?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
bancaletto
  • 13
  • 2

2 Answers2

0

Supposing that you are calling incorrectly apply in your DataFrame and that fill_age() are working correctly on df["Age"] values, you need to replace this statement, just to evaluate x and asign a determined value (current Age or to be replace with external data) then checking by else-if conditional, this code shouldn't return errors

df["Age"] = df["Age"].apply(lambda x: fill_age(x) if np.isnan(x) else x)
user11717481
  • 1
  • 9
  • 15
  • 25
  • Still not working, pretty sure the error is on the fill_age function. I get: "TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''. " – bancaletto Mar 08 '22 at 07:34
0

We should try avoid use apply, so we could use instead:

df['Age'] = df['Age'].fillna(
    df.groupby(['CustomerType', 'Satisfaction'])['Age'].transform('first')
)
ansev
  • 30,322
  • 5
  • 17
  • 31