0

I have a data 200 cols and 30k rows. I have a missing data and I'd like to predict it to fill in the missing data. I want to predict None values and put the predicted data there. I want to split data by indexes, train model on Known data, predict Unknown values, join Known and Predicted values and return them back to data on exactly the same places.

P.S. Median, dropna and other methods are not interesting, just prediction of missed values.

df = {'First' : [30, 22, 18, 49, 22], 'Second' : [80, 28, 16, 56, 30], 'Third' : [14, None, None, 30, 27], 'Fourth' : [14, 85, 17, 22, 14], 'Fifth' : [22, 33, 45, 72, 11]}
df = pd.DataFrame(df, columns = ['First', 'Second', 'Third', 'Fourth'])    

Same DF with all cols comleated by data.

Erfan
  • 40,971
  • 8
  • 66
  • 78
Nigel
  • 13
  • 3
  • 1
    Perhaps you could tell us what you tried and what your specific doubt is? As it is, the question reads a bit more like a homework assignment! – Pablo Jul 15 '19 at 13:43
  • As I've mentioned earlier I have a dataset 200 cols and 30k rows. I have a huge quantity of missed data that I would like to predict and to fill with this predicted data empty sels. After that I can moove to prediction model of all dataset. Sorry, I don't understand what sort of additional information might be helpful. – Nigel Jul 15 '19 at 13:51
  • Perhaps try to split the question to find the part that is bothering you. Do you know how to split your data? Do you know how to predict your missing values? Is your trouble only how to replace NaNs with a custom function? – Pablo Jul 15 '19 at 14:13
  • the problem is to connect predicted data with known data and after that to return it back at the place where this column was – Nigel Jul 15 '19 at 14:19
  • What do you mean by "connect predicted data with known data"? We can worry about the rest later. – Pablo Jul 15 '19 at 14:32

2 Answers2

0

I do not really understand your question as well but I might have an idea for you. Have a look at the fancyimpute package. This package offers you imputation methods based on predictive models (e.g. KNN). Hope this will solve your question.

Kalle
  • 53
  • 8
0

It is hard to understand the question. However, it seems like you may be interested in this question and the answer.

Using a custom function Series in fillna

Basically (from the link), you would

  1. create a column with predicted values
  2. use fillna with that column as parameter
Pablo
  • 1,373
  • 16
  • 36