0

Question:

V is a vector with multiple NAs. Write a function to replace these NA values such that a missing value at index i should be replaced by the mean of the non-NA values at index p and q where |p – i| + |q – i| is minimized.

So, if my vector is ("NA", 1, 2, "NA", "NA", 3) then my result needs to be (1.5, 1, 2, 1.5, 1.5, 3)

How can I write a nested for loop to produce this output?

henrycarteruk
  • 12,708
  • 2
  • 36
  • 40
Sumana
  • 1
  • 2
    Welcome to StackOverflow. In this instance, probably `myVec[is.na(myVec)] <- mean(myVec, na.rm=TRUE)` will work. However, please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo Apr 14 '17 at 15:14
  • This question is about imputing. You may want to search for imputing in R. @Imo gives a great suggestion. – Arya McCarthy Apr 14 '17 at 16:21
  • Can you explain me why the vector at position 1, 4 and 5 is 1.5? I'm making the script but don't know how to assign the value at that position – gonzalez.ivan90 Apr 14 '17 at 18:49

2 Answers2

1

You can use this one:

vect <- c( NA, 1, 2, NA, NA, 3) 
flag <- is.na(vect)+0
wh <- which(is.na(vect)==1)
flag[flag==1] <- wh
#flag is a container of all values, however a missing vector position will contain a value of 0 a non missing value will contain the position
k <- 0
#a rolling itertor which changes the mean as per the non missing values in the vector
vect_ <- vect
# Final vector which will have the outcome.
for(i in 1:(length(vect))){
  k <- ifelse(flag[i] > 0 , k+1,k)
  k <-  ifelse(k == length(wh), k-1,k)
 vect_[i] <- ifelse(flag[i] > 0, 
                   mean(vect[min(wh):diff(c(1,wh[1+k]))],na.rm=T),vect[i] )
}

vect_

> vect_
[1] 1.5 1.0 2.0 1.5 1.5 3.0
PKumar
  • 10,971
  • 6
  • 37
  • 52
0
dist_elem <- function(x,pos){
  # Function that calculates the distance between pos and all other positions in vector x
  d <- rep(0,length(x))
  for (i in 1:length(x)){
    d[i] <- abs(pos - i)
  }
  return(d)
}


for (i in 1:length(x)){

  if (is.na(x[i])){
    # distances between all other elements
    distances <- dist_elem(x,i)
    # NA for that element
    distances[distances == 0] <- NA
    # NA for the NAs
    distances[is.na(x)] <- NA
    # Sort and get the first two lower distances
    s <- sort(distances)[1:2] 
    # The closest element (first of that vector in case of ties)
    x1 <- x[which(distances == s[1])[1]]
    # The second closest element (first of that vector in case of ties)
    x2 <- x[which(distances == s[2])[1]]

    out <- ( x1 + x2 )/2

    x[i] <- out
  }

}
Vasilis Vasileiou
  • 507
  • 2
  • 8
  • 20