0

I'm new to R, but I'm trying to estimate a missing value in a large microarray dataset using impute.knn() from library(impute) using 6 nearest neighbors.

Here's an example:

seq1 <- seq(1:12)
mat1 <- matrix(seq1, 3)
mat1[2,2] <- "NA"
impute.knn(mat1, k=6)

I get the following error:

Error in knnimp.internal(x, k, imiss, irmiss, p, n, maxp = maxp) : 
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion

I've also tried the following:

impute.knn(mat1[2,2], k=6)

and I get the following error:

Error in rep(1, p) : invalid 'times' argument

My google-fu has been off today. Any suggestions to why I might be getting this error?

edit: I've tried

mat1[2,2] <- NA 

as James suggested, but I get a segmentation fault. Using

replace(mat1, mat1[2,2], NA) 

does not help either. Any other suggestions?

Steve Hwang
  • 1,589
  • 4
  • 14
  • 12

1 Answers1

4

I'm not sure why impute.knn is set up the way it is, but the example within ?impute.knn uses khanmiss which is a data.frame of factors, which when coerced to matrix will be character.

You are getting a segmentation fault because you are trying to impute with K > ncol(mat1) nearest neighbours. It might be worth reported a bug to the package authors, as this could easily be checked in R and return an error, not a C level error which kills R.

mat1 <- matrix(as.character(1:12), 3)
mat1[2,2] <- NA # must not be quoted for it to be a NA value
# mat1 is a 4 column matrix so
impute.knn(mat1, 1)
impute.knn(mat1, 2)
impute.knn(mat1, 3)
impute.knn(mat1, 4)
# Will all work 

note

despite the strange example, mat1will when it is integer or double as well

mat1 <- matrix(1:12,3)
mat1[2,2] <- NA
impute.knn(mat1,2)

mat1 <- matrix(seq(0,1,12),3)
mat1[2,2] <- NA
impute.knn(mat1,2)

take home message

Don't try to use impute using more information than you have.

Perhaps the package authors should take heed of

fortunes(15)

It really is hard to anticipate just how silly users can be. —Brian D. Ripley R-devel (October 2003)

and build in some error checking so a simple error does not cause a segfault.

mnel
  • 113,303
  • 27
  • 265
  • 254
  • I tried this on my dataset, but when I try to get the dimensions via dim(matrix), NULL is returned. Have any idea why? – Steve Hwang Oct 08 '12 at 01:07
  • If you post your data, then I could answer your question. If your data is called *my_data*, post the results of *dput(head(my_data))* – mnel Oct 08 '12 at 01:16
  • I ended up posting my problem here as I cannot share code in the comments: http://stackoverflow.com/questions/12774318/matrix-turns-into-something-else-when-i-run-it-through-impute-knn – Steve Hwang Oct 08 '12 at 01:33