0

I'm following along with this question here: efficiently locf by groups in a single R data.table

This seems perfect for my data, as I have grouped data with multiple columns, where I am trying to carry the last observation forward. However, I would like to limit how far forward it is carried. The relevant part of the code is !is.na(x). Let's say I want to limit it to two, then given the sequence TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE, I would like to have it as TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE.

This itself caries a value of true forward up to n times (very similar to XTS), which seems to make it redundant in using this method instead of xts.na.locf, but I'm hoping there is an efficient way to do this that avoids xts. Thanks for any help.

Community
  • 1
  • 1
Almacomet
  • 211
  • 2
  • 9

1 Answers1

1

One possibility is to modify the Run Length Encoding of the vector by shifting the unwanted repetitions of FALSE onto the next TRUE:

mx <- 2
v <- c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE)
r <- rle(v)
if(!r$values[length(r$values)]) {
  r$values <- c(r$values,TRUE)
  r$lengths <- c(r$lengths,0)
}
changes <- pmax(0,r$lengths-mx) * (r$values == FALSE)
r$lengths <- r$lengths - changes + c(0,head(changes,-1))

You'd obviously have to test if this is more efficient for your use case.

Edit: Output is as expected:

> print(inverse.rle(r))
 [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE

Edit 2: Short explanation:

  • pmax(0,r$lengths-mx) is a vector whose components are either zero (if the length is at most mx) or the difference between the length and mx. Since only care the repetitions of FALSE are relevant, multiplying by (r$values == FALSE) is necessary which zeroes any entries of the vector corresponding to TRUE.
  • Due to the if it is known that the last element of r$values is TRUE. Thus we can move the unwanted FALSEs to the following TRUE. This is achieved by first subtracting from the number of FALSEs and then adding to the number of TRUEs. Since we know that the last entry of changes is for a TRUE taking c(0,head(changes,-1)) simply shifts all changes (for FALSE) to the right (and thus onto a TRUE).
  • Perfect, you even handled cases where it ends in FALSE elegantly. I was wondering if you could explain the last two lines of your code (changes and modifying r lengths) for my own understanding? I was attempting to use if statements with cases, but this is cleaner and faster. Thanks! – Almacomet Feb 09 '17 at 00:52