1


I would be very greatful if somebody could explain how this KNN imputation works and how it is used to fill the Na's and empty factors/character with values based on similar records.Like for example:

   KL_ID freq1 freq2 total1 total2 type1 type2 margin_visit margin_total
264149 462132    24    27 529.05 555.48   low   low    12.500000     4.995747
131702 277868    24    22 154.63 122.21   low   low    -8.333333   -20.966177
284924 488875   107   107 646.43 816.82  high  high     0.000000    26.358616
281236 484241    14    32 365.64 942.75   low   low   128.571429   157.835576
144396 295443     0     1   0.00  19.56     0   low          Inf          Inf
143278 293956     2     0 121.71   0.00   low     0  -100.000000  -100.000000
457256 730168     1    12  48.55 107.89   low   low  1100.000000   122.224511
151368 304711    28    30 997.60 919.11   low   low     7.142857    -7.867883
219131 399018     2     0  18.11   0.00   low     0  -100.000000  -100.000000
392124 641192     4     6  25.50  32.48   low   low    50.000000    27.372549
56849  172985     9     1 116.75  14.34   low   low   -88.888889   -87.717345
14950  113654     1     1  28.69  43.46   low   low     0.000000    51.481352
534871 828187    17    33  36.74 136.50   low   low    94.117647   271.529668
152378 306057    35     8 410.54 101.38   low   low   -77.142857   -75.305695
189103 357116    33    10 231.65  38.60   low   low   -69.696970   -83.336931
           kltype VANUS  SUGU RAHVUS INFOKOJU
264149 nonchurner    NA  <NA>   <NA>     <NA>
131702    churner    59 naine    EST        J
284924 nonchurner    NA  <NA>   <NA>     <NA>
281236 nonchurner    NA  <NA>   <NA>     <NA>
144396 nonchurner    39 naine    EST        J
143278    churner    35 naine    EST        E
457256 nonchurner    22  mees    RUS        J
151368    churner    41 naine    EST        J
219131    churner    NA  <NA>   <NA>     <NA>
392124 nonchurner    NA  <NA>   <NA>     <NA>
56849     churner    41 naine    EST        J
14950  nonchurner    55  mees    EST        J
534871 nonchurner    NA  <NA>   <NA>     <NA>
152378    churner    32  mees    RUS        J
189103    churner    43  mees    EST        J


As can be seen that we have lot of missing values in the dataset , how can we impute logical values for the charachter and factors.I understand that with zoo package we can impute numerics.

Thanks for the help.

Prashanth
  • 73
  • 1
  • 1
  • 7
  • Each column of a `data.frame` has a specific type. You cannot put a `logical` value in a `character` column; it would be instantly coerced to `character` which is a more expensive type. How do you want to replace the `NA`s? With which logic? Please, provide the expected output. – nicola Feb 22 '16 at 11:46
  • @nicola : okey. I want to replace Na's of SUGU with other records that have some similarity of correlation like KNN imputation.
    I dont want a logical value in a charachter column but what I need is that the NA is replaced by a similar record taking similarity of other columns into consideration.
    if Na is missing for SUGU and the other columns have a correlation with this record then this NA is replaced by that record.
    This is just my idea but you can suggest me other ways.
    – Prashanth Feb 22 '16 at 12:04
  • The point here is that my dataset contains 542099 records and omitting the NA's leave me with just 190501 records so , I am missing a huge amount of data and its a bias when I try to do some prediction on it. – Prashanth Feb 22 '16 at 12:09
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and use `dput` vs printing your data. I have an R package that provides missing value imputation for multinomial (ie- categorical) variables. I'll provide an answer if you provide a valid question. – alexwhitworth Mar 10 '16 at 17:22

1 Answers1

0

knn seems to be a nice way to solve such case.
A simple
a<-kNN(df,variables=c("col1","col2"),k=6)
would do the imputation although incase of many NAs its not advised.

Prashanth
  • 73
  • 1
  • 1
  • 7