1

Possible Duplicate:
Replacing NAs with latest non-NA value

How can I fill up missing information using the previous values for each column?

Date.end   Date.beg   Pollster Serra.PSDB
2012-06-26 2012-06-25  Datafolha       31.0
2012-06-27       <NA>       <NA>         NA
2012-06-28       <NA>       <NA>         NA
2012-06-29       <NA>       <NA>         NA 
2012-06-30       <NA>       <NA>         NA
2012-07-01       <NA>       <NA>         NA
2012-07-02       <NA>       <NA>         NA
2012-07-03       <NA>       <NA>         NA
2012-07-04       <NA>       Ibope        22
2012-07-05       <NA>       <NA>         NA
2012-07-06       <NA>       <NA>         NA
2012-07-07       <NA>       <NA>         NA
2012-07-08       <NA>       <NA>         NA
2012-07-09       <NA>       <NA>         NA
2012-07-10       <NA>       <NA>         NA
2012-07-11       <NA>       <NA>         NA
2012-07-12 2012-07-09     Veritá       31.4
Community
  • 1
  • 1
daniel
  • 1,186
  • 2
  • 12
  • 21

1 Answers1

2

I'm not sure if that is the best way to do it. Probably there is some package with exactly that functionality out there. The following approach might not be the one with the very best performance, but it certainly works and should be fine for small to medium datasets. I would be cautious to apply it for very large datasets (more than a million rows or something like that)

fillNAByPreviousData <- function(column) {
    # At first we find out which columns contain NAs
    navals <- which(is.na(column))
    # and which columns are filled with data.
    filledvals <- which(! is.na(column))

    # If there would be no NAs following each other, navals-1 would give the
    # entries we need. In our case, however, we have to find the last column filled for
    # each value of NA. We may do this using the following sapply trick:
    fillup <- sapply(navals, function(x) max(filledvals[filledvals < x]))

    # And finally replace the NAs with our data.
    column[navals] <- column[fillup]
    column
}

Here is some example using a test dataset:

set.seed(123)
test <- 1:20
test[floor(runif(5,1, 20))] <- NA

> test
 [1]  1  2  3  4  5 NA  7 NA  9 10 11 12 13 14 NA 16 NA NA 19 20

> fillNAByPreviousData(test)
 [1]  1  2  3  4  5  5  7  7  9 10 11 12 13 14 14 16 16 16 19 20
Thilo
  • 8,827
  • 2
  • 35
  • 56
  • It works, thanks. Nonetheless, I've had to repeat the task many times, since the solution not work for a whole data frame. – daniel Nov 22 '12 at 23:44
  • You could have done that using `apply`. However, the answer in the duplicate question is probably much faster than mine. – Thilo Nov 23 '12 at 06:53