How to Replace Nan Values Meaningfully for Machine Learning

Question

I have a few categorical variables which I binary encoded.

The problem is there are a lot of Nan values, I know I can just do df.fillna(0) for replacing the nan values..but will that be meaningful for machine learning?

Some columns have data and some columns are filled with Nans, and this varies row by row.

How to make the data useful? What specific operation is required?

No you can't simply put fillna(0) for times. it depends on how your data distributed and business logic — Mohamed Thasin ah, May 31 '18 at 05:49
Perhaps refering this might help . https://machinelearningmastery.com/handle-missing-data-python/ — Bharath M Shetty, May 31 '18 at 06:13
You can also predict the missing data with other features. FYI:https://stackoverflow.com/questions/35680426/missing-value-in-data-analysis — Kinson, May 31 '18 at 07:12

score 4 · Answer 1 · edited May 31 '18 at 06:06

Missing Values are most common, to fill some data in that position there are various methods. But before filling some data remember that missed data some what closed to real data. For example, In financial analysis when the customer transaction value is missing, then you should not put zero, for that you could fill it by mean or median based on the data distribution.

Filling missed data critically depends on the data and business logic.

you could fill value by one of following methods,

filling with constant

df.fillna(0)

filling with mean,median etc.,

df.fillna(df.mean())

groupby filling with mean,median etc.,

df['a'].fillna(df.groupby('b')['a'].transform('mean'))

there are many methods and details please visit here

How to Replace Nan Values Meaningfully for Machine Learning

1 Answers1