4

Thanks for taking your time to look at my problem! Am new to the forum and relatively new to R, but I'll do my best to formulate the question clearly.

I have a big set of tree sample data with an irregular number of rows per individual. In the "class" variable column (here column 2) the first row of each individual has a value (1, 2, 3 or 4) and the subsequent ones are NA. I'd like to assign the first value of each individual to the respective subsequent NA-cells (belonging to the same individual).

Reproducible example dataframe (edited):

test <- cbind(c(1, 2, 3, NA, 4, NA, NA, NA, 5, NA, 6), c(3, 4, 3, NA, 1, NA, NA, NA, 2, NA, 1))
colnames(test) <- c("ID", "class")

        ID  class
 [1,]    1    3
 [2,]    2    4
 [3,]    3    3
 [4,]   NA   NA
 [5,]    4    1
 [6,]   NA   NA
 [7,]   NA   NA
 [8,]   NA   NA
 [9,]    5    2
[10,]   NA   NA
[11,]    6    1

The result I am looking for is this:

      ID class
 [1,]  1     3
 [2,]  2     4
 [3,]  3     3
 [4,] NA     3
 [5,]  4     1
 [6,] NA     1
 [7,] NA     1
 [8,] NA     1
 [9,]  5     2
[10,] NA     2
[11,]  6     1

I copied the last solution from this topic How to substitute several NA with values within the DF using if-else in R? and tried to adapt it to my needs like this

    test2 <- as.data.frame(t(apply(test["class"], 1, function(x)
    if(is.na(x[1]) == FALSE && all(is.na(head(x[1], -1)[-1])))
    replace(x, is.na(x), x[1]) else x)))

but it gives me the error "dim(x) must have positive length". I tried many other versions and it gives me all sorts of errors, I don't even know where to start. How can I improve it?

Community
  • 1
  • 1
Brazza
  • 47
  • 6
  • 1
    I would use `na.locf` function (from package `zoo`), or a [base-R alternative](http://stackoverflow.com/questions/19838735/how-to-na-locf-in-r-without-using-additional-packages) to propagate the previous value on the consecutive NAs... i.e. `test[,2] <- na.locf(test[,2])` – digEmAll Mar 03 '15 at 21:31

1 Answers1

2

Here's a little one-line function that'll work, in case you don't want to load another package:

rollForward <- function(x) {
    c(NA, x[!is.na(x)])[cumsum(!is.na(x)) + 1]
}

test[,"class"] <- rollForward(test[,"class"])
test
#       ID class
#  [1,]  1     3
#  [2,]  2     4
#  [3,]  3     3
#  [4,] NA     3
#  [5,]  4     1
#  [6,] NA     1
#  [7,] NA     1
#  [8,] NA     1
#  [9,]  5     2
# [10,] NA     2
# [11,]  6     1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455