Last Observation Carried Forward In a data frame?

Question

I wish to implement a "Last Observation Carried Forward" for a data set I am working on which has missing values at the end of it.

Here is a simple code to do it (question after it):

LOCF <- function(x)
{
    # Last Observation Carried Forward (for a left to right series)
    LOCF <- max(which(!is.na(x))) # the location of the Last Observation to Carry Forward
    x[LOCF:length(x)] <- x[LOCF]
    return(x)
}


# example:
LOCF(c(1,2,3,4,NA,NA))
LOCF(c(1,NA,3,4,NA,NA))

Now this works great for simple vectors. But if I where to try and use it on a data frame:

a <- data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))
a
t(apply(a, 1, LOCF)) # will make a mess

It will turn my data frame into a character matrix.

Can you think of a way to do LOCF on a data.frame, without turning it into a matrix? (I could use loops and such to correct the mess, but would love for a more elegant solution)

score 23 · Accepted Answer · answered May 05 '10 at 19:31

23

This already exists:

library(zoo)
na.locf(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA)))

answered May 05 '10 at 19:31

Shane

98,550
35
224
217

2

+1 and rseek.org of course immediately hits this as first results. – Dirk Eddelbuettel May 05 '10 at 19:34
My bid for not rseeking it - thanks Shane. But I am afraid it doesn't do the job. (it fills column 3, instead of each row) – Tal Galili May 05 '10 at 19:45
1

You could have also found this if you searched stackoverflow.com for `[r] locf`. – Shane May 05 '10 at 19:47
Hi Shane, I also wasn't able to find solution in that search (Although this thread is nice: http://stackoverflow.com/questions/1782704/propagating-data-within-a-vector/1783275#1783275 ) – Tal Galili May 05 '10 at 19:53
Look at the accepted answer to that thread. That's what I was referring to. I don't think this question is a duplicate because the other questioner was asking about vectors and you're asking about data frames, but they're very closely related (and the answer is the same). – Shane May 05 '10 at 20:06
Hi Shane, the function can be used like this: t(na.locf(t(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))))) But it will not "solve" the question, since I would need to go through the resulting "matrix" and turn it back to a data.frame. And thanks for taking the time to help :) Tal – Tal Galili May 05 '10 at 20:38
Oh...you want to carry column values "forward"? That isn't usually what people do. An "observation" is a row value in R, so LOCF means carry row values downward. You're carrying values across columns. I can't even imagine a circumstance in which one would do that? – Shane May 05 '10 at 20:50
Hi Shane, it's very simple. I have a wide (instead of long) data.frame. I can turn it to long and then use a function from the other SO thread. The only problem with that would be the case of a the first value being missing... – Tal Galili May 05 '10 at 20:59
1

If the first value is missing, then you can make a judgement about what to do to handle it. No function will solve that problem for you. You will need to either leave the whole thing as missing, or set a default first value (like zero, for instance). – Shane May 05 '10 at 21:01
I don't see why turning the matrix back to a data.frame with `data.frame(t(na.locf(t(dat))))` should be a problem. And following `na.locf(dat)` with `na.locf(dat, fromLast = TRUE)` should carry next observations backward (NOCB) and fill first missing values. No? So: `data.frame(t(na.locf(na.locf(t(dat)),fromLast=T)))` – Oct 18 '14 at 11:48

score 11 · Answer 2 · answered Jan 19 '17 at 21:38

11

If you do not want to load a big package like zoo just for the na.locf function, here is a short solution which also works if there are some leading NAs in the input vector.

na.locf <- function(x) {
  v <- !is.na(x)
  c(NA, x[v])[cumsum(v)+1]
}

answered Jan 19 '17 at 21:38

Henrik Seidel

301
3
3

I like this solution best. If you want to apply it to a `data.frame` like in the original question, you can use it via `a[]=lapply(a,na.locf)`. – cryo111 Dec 14 '17 at 14:10

score 10 · Answer 3 · answered Jul 14 '17 at 10:16

Adding the new tidyr::fill() function for carrying forward the last observation in a column to fill in NAs:

a <- data.frame(col1 = rep("a",4), col2 = 1:4, 
                col3 = 1:4, col4 = c(1,NA,NA,NA))
a
#   col1 col2 col3 col4
# 1    a    1    1    1
# 2    a    2    2   NA
# 3    a    3    3   NA
# 4    a    4    4   NA

a %>% tidyr::fill(col4)
#   col1 col2 col3 col4
# 1    a    1    1    1
# 2    a    2    2    1
# 3    a    3    3    1
# 4    a    4    4    1

Steffen Moritz · Answer 4 · 2022-06-29T04:52:40.617

There are a bunch of packages implementing exactly this functionality. (with same basic functionality, but some differences in additional options)

spacetime::na.locf
imputeTS::na_locf
zoo::na.locf
xts::na.locf
tidyr::fill

Added a benchmark of these methods for @Alex:

I used the microbenchmark package and the tsNH4 time series, which has 4552 observations. These are the results:

So for this case na_locf from imputeTS was the fastest - closely followed by na.locf0 from zoo. The other methods were significantly slower. But be careful it is only a benchmark made with one specific time series. (added the code that you can test for your specific use case)

Results as a plot:

Here is the code, if you want to recreate the benchmark with a self selected time series:

library(microbenchmark)
library(imputeTS)
library(zoo)
library(xts)
library(spacetime)
library(tidyr)

# Create a data.frame from tsNH series 
df <- as.data.frame(tsNH4)

res <- microbenchmark(imputeTS::na_locf(tsNH4),
                    zoo::na.locf0(tsNH4),
                    zoo::na.locf(tsNH4), 
                    tidyr::fill(df, everything()), 
                    spacetime::na.locf(tsNH4), 
                    times = 100)
ggplot2::autoplot(res)

plot(res)

# code just to show each methods produces correct output
spacetime::na.locf(tsNH4)
imputeTS::na_locf(tsNH4)
zoo::na.locf(tsNH4)
zoo::na.locf0(tsNH4)
tidyr::fill(df, everything())

Also tidyverse has an equivalent fill() function. It would be great to have something fast in data.table. — skan, May 25 '17 at 08:58

score 2 · Answer 5 · answered Apr 09 '13 at 15:26

2

This question is old but for posterity... the best solution is to use data.table package with the roll=T.

answered Apr 09 '13 at 15:26

Dave31415

2,846
4
26
34

21

fill out with an example – mnel Apr 11 '13 at 05:18

score 0 · Answer 6 · answered May 05 '10 at 21:01

I ended up solving this using a loop:

fillInTheBlanks <- function(S) {
  L <- !is.na(S)
  c(S[L][1], S[L])[cumsum(L)+1]
}


LOCF.DF <- function(xx)
{
    # won't work well if the first observation is NA

    orig.class <- lapply(xx, class)

    new.xx <- data.frame(t( apply(xx,1, fillInTheBlanks) ))

    for(i in seq_along(orig.class))
    {
        if(orig.class[[i]] == "factor") new.xx[,i] <- as.factor(new.xx[,i])
        if(orig.class[[i]] == "numeric") new.xx[,i] <- as.numeric(new.xx[,i])
        if(orig.class[[i]] == "integer") new.xx[,i] <- as.integer(new.xx[,i])   
    }

    #t(na.locf(t(a)))

    return(new.xx)
}

a <- data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))
LOCF.DF(a)

score 0 · Answer 7 · answered Jul 22 '15 at 13:48

Instead of apply() you can use lapply() and then transform the resulting list to data.frame.

LOCF <- function(x) {
    # Last Observation Carried Forward (for a left to right series)
    LOCF <- max(which(!is.na(x))) # the location of the Last Observation to Carry Forward
    x[LOCF:length(x)] <- x[LOCF]
    return(x)
}

a <- data.frame(rep("a",4), 1:4, 1:4, c(1, NA, NA, NA))
a
data.frame(lapply(a, LOCF))

Last Observation Carried Forward In a data frame?

7 Answers7

Linked

Related