0

I am trying to do a series of function but selecting between two variables. I need to first impute the missing values then normalize the variables. To impute I use the following code.

for(i in (train$B365A:train$BSA)){
  data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE))
 }

so for above, I am trying to impute the missing values, they have approximately 20 variables between them.

I have come up with this but it is not affecting the cells.

convert_num <- function(i) {
 i <- as.numeric(i)
}
for (i in c(1:3)){
 convert_num(i)
}

The data looks similar to the following hope coal kite 3 4 5 2 1 5 right now its class but need to be numeric.It has over 20 variables and 18k row.

  • You sure [`impute`](http://www.thefreedictionary.com/impute) is the word you meant to use, and not `compute` or maybe `input`? (I'm not a statistician, so I don't know if impute has some specialized statistical definition, but I'm a native English speaker and it's not a word I've ever heard before) – Parthian Shot Oct 04 '17 at 00:46
  • 4
    replacing the NA with the mean, I was using the term used in SAS, sorry. Impute - "assign (a value) to something by inference from the value of the products or processes to which it contributes." Webster dictionary – Kurt Bembridge Oct 04 '17 at 00:48
  • 1
    `mean(data[i])` when `i` is a single value is relatively meaningless. You won't get much help unless you provide some form of a [reproducible question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), since there is no clear indication of what `data` nor `train` are. (And "impute" is used relatively correctly.) You might look at `zoo::na.approx` or `zoo:na.spline`. – r2evans Oct 04 '17 at 02:15

1 Answers1

0

if I understand correctly the solution to your problem would be the following.

data <- data.frame(c1 = c(rbinom(10,5,0.5)),
                   c2 = c(rbinom(10,5,0.5)), 
                   c3 = c(rbinom(10,5,0.5)))
data[2:4,1] <- rep(NA,3);data[c(6,8),2] <- rep(NA,2);data[10,3] <- NA
data

# imput data from c1:c3
for(i in 1:3){
  data[i][is.na(data[i])] <- round(mean(data[,i], na.rm = T))
}
data
data[] <- lapply(data,as.numeric) # transform to numeric
sapply(data,class)
Rafael Díaz
  • 2,134
  • 2
  • 16
  • 32