0

I am trying to make a function that replaces NA with mean / median.

The code below is working -[mydata is a data frame]

data = mydata
type = mean

{ 
    for (i in which(sapply(data, is.numeric))){
        data[is.na(data[, i]), i] <- type(data[, i],  na.rm = TRUE)
    }
}

Why the following code is not working when i wrap it in a function?

impute <- function(data, type) { 
    for (i in which(sapply(data, is.numeric))) {
        data[is.na(data[, i]), i] <- type(data[, i],  na.rm = TRUE)
    }
}


impute(data=mydata,mean)
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
Riya
  • 193
  • 1
  • 10
  • 1
    Can you include a sample dataset (e.g. `dput(mydata)`) to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Jthorpe May 16 '15 at 16:28
  • 1
    Your function needs a return value, like `data`, at the end. This is not a great way to code, though. What happens when you pass a function that doesn't have `na.rm` as an option? – Frank May 16 '15 at 16:37

1 Answers1

1

To make @Frank's comment explicit, you need either explicitly return the data frame, as in:

impute <- function(data, type) { 
    for (i in which(sapply(data, is.numeric))) {
        data[is.na(data[, i]), i] <- type(data[, i],  na.rm = TRUE)
    }
    return(data)
}

or implicitly, as in:

impute <- function(data, type) { 
    for (i in which(sapply(data, is.numeric))) {
        data[is.na(data[, i]), i] <- type(data[, i],  na.rm = TRUE)
    }
    data
}

and to update your data.frame data you would then call your impute function like so:

newdata <- impute(data,mean)

If you want to avoid a for loop, here's a way:

impute <- function(data, type)
    sapply(data,
           function(x)
               if(is.numeric(x) && any(is.na(x)) && !all(is.na(x)))
                   x[is.na(x)]  <-  type(x[!is.na(x)])
               x)
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • Thanks a bunch, It works like a charm. Would you please correct the following version? I thought not to use for loop. It is giving me an error "Error in match.fun(FUN)." impute <- function(x,type) replace(x, is.na(x), type(x, na.rm = TRUE)) cols <- sapply(mydata, is.numeric) ss1=data.frame(apply(mydata[,cols],2,impute(mydata[,cols],mean))) – Riya May 16 '15 at 17:07