1

I am trying to replace the null values in a column based on categorical value of another column.But the == operator is making me regret all the big decision in my life. I have 8523 rows and 12 columns in Train set, of which 7 are categorical and 5 are numerical.

Columns are 'Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales'

I want to fill the NaN values(float dtype) in the 'Item_Weight' column based on the categorical value of 'Outlet_Location_Type'. I have a dictionary(city_type_mean) with the categorical values as keys and the corresponding values to be replaced as values. I used the following code

train["Item_Weight"] = train.apply(lambda x: city_type_mean[x['Outlet_Location_Type']] if x["Item_Weight"] == np.nan else x["Item_Weight"], axis=1) 

But the Nan value remains unaffected. I am attaching a train data sample following the problemmatic code image. Train data sample.problemmaticcode snippet The problem I've so far troubleshooted was the above if condition always evaluates to false leading to else being executed. And I've tried the condition with is and pd.isnull() methods but to no avail.Any help with the problem is much appreciated.Also please intimate me before marking this question in case of duplication.

Ananthu
  • 139
  • 1
  • 9
  • instead of `== np.nan` you should use `x["Item_Weight"].isna()`. You can't compare anything to `np.nan` – Stef Feb 13 '21 at 12:39
  • The short of it: it works as expected and as [documented in the big yellow Warn box](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#values-considered-missing) – Patrick Artner Feb 13 '21 at 12:48
  • @Stef I tried isna() method, but it raises an attribute error since lambda expression accesses the float value in the 'Item_weight' column but not the column itself so the other similar function like isnull also won't work. – Ananthu Feb 13 '21 at 13:02

1 Answers1

1

can you please try isnan instead of == np.nan ?

train["Item_Weight"] = train.apply(lambda x: city_type_mean[x['Outlet_Location_Type']] if  np.isnan(x["Item_Weight"]) else x["Item_Weight"], axis=1) 
r.burak
  • 514
  • 5
  • 10
  • 1
    Thanks a lot @r.b.leon. You are definitely one of my top most favorite person now. Works like a charm. – Ananthu Feb 13 '21 at 13:06
  • very welcome :) your positive attitude makes me smile :) enjoy. – r.burak Feb 13 '21 at 13:16
  • On the same note how can I find the Null values in an object type column of a df, I tried np.isnan throws type error. Can you help me out. @r.b.leon – Ananthu Feb 17 '21 at 04:05
  • it will not work for string. you can use pandas isna function https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isna.html – r.burak Feb 18 '21 at 08:27
  • Sorry I forgot to add that it is also within lambda expression. So isna() function will not be suitable. – Ananthu Feb 19 '21 at 05:08