0

I'm used to using dplyr's lag() and lead() in my code, but I'm wondering -- is there a base R alternative?

For example, assume the following dataframe:

df<-data.frame(a=c("a","a","a","b","b"),stringsAsFactors=FALSE)

Using dplyr, I could do this to mark the beginning of a new grouping in a:

df %>% mutate(groupstart=a!=lag(a)|is.na(lag(a)))
  a groupstart
1 a       TRUE
2 a      FALSE
3 a      FALSE
4 b       TRUE
5 b      FALSE

Is there a way to do this in base R?

Jaap
  • 81,064
  • 34
  • 182
  • 193
iod
  • 7,412
  • 2
  • 17
  • 36
  • Just do a custom function. Something like [this one](https://stackoverflow.com/a/13128713/5635580) – Sotos Jun 28 '19 at 12:40
  • This seems to be a duplicate of [R shifting a vector](https://stackoverflow.com/questions/26997586/r-shifting-a-vector); have a look at the linked post and the answers of nograpes and petermeissner therein for a base R implementation of `lag`/`lead`; I've closed for now as a dupe but happy to re-open your question if I misunderstood – Maurits Evers Jun 28 '19 at 12:44
  • @MauritsEvers I am not sure it is a duplicated of that – Carles Jun 28 '19 at 12:53
  • @CarlesSansFuentes *"I'm used to using dplyr's lag() and lead() in my code, but I'm wondering -- is there a base R alternative?"* The dupe target addresses exactly that question. Can you elaborate on why you *don't* think this is a dupe? As I said, I may have misunderstood in which case I'll re-open. – Maurits Evers Jun 28 '19 at 12:55
  • @MauritsEvers. What he is trying to do is finding the first value of a vector that is different from the others. Despite of the fact he is asking for the base R concept for `lag`, the purpose is another one, which for me is what the question is about. That is what I understand from the question and its purpose. Maybe it should be edited. What do you think ? – Carles Jun 28 '19 at 13:01
  • @CarlesSansFuentes You may have a valid point, in which case this would be an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Although @iod's comment to your post below seems to suggest that he is in fact after a `lag`/`lead` function. Perhaps OP can clarify. – Maurits Evers Jun 28 '19 at 13:04
  • @MauritsEvers if anything, Sotos's link seems more relevant. But it looks like the answer is essentially "no, not directly". – iod Jun 28 '19 at 13:04
  • 1
    @CarlesSansFuentes as I mentioned, this is just an example I made up on the spot for a use. I could pick any of a hundred other times I used lag/lead in the past. I was just wondering if there's a relatively straightforward base-R alternative. – iod Jun 28 '19 at 13:06
  • @iod Both links provide methods to lag a vector in base R; the link @Sotos provides extends those methods to `data.frame`s. `lag`/`lead` applies to vectors, so to me the dupe target seems to be more fitting. – Maurits Evers Jun 28 '19 at 13:06
  • Okai, then i understood it wrong. Thank you for the explanation – Carles Jun 28 '19 at 14:54

1 Answers1

3

You could do something like this, where NAs are combined with a subset of df$a in lag_a, which is then compared with df$a:

lag_a <- c(rep(NA, 1), head(df$a, length(df$a) - 1))
df$groupstart <- df$a != lag_a | is.na(lag_a)

#### OUTPUT ####

  a groupstart
1 a       TRUE
2 a      FALSE
3 a      FALSE
4 b       TRUE
5 b      FALSE

You can generalize this principle in a function:

lead_lag <- function(v, n) {
    if (n > 0) c(rep(NA, n), head(v, length(v) - n))
    else c(tail(v, length(v) - abs(n)), rep(NA, abs(n)))
}

#### OUTPUT ####

lead_lag(df$a, 2)  #[1] NA  NA  "a" "a" "a"
lead_lag(df$a, -2) #[1] "a" "b" "b" NA  NA
lead_lag(df$a, 3)  #[1] NA  NA  NA  "a" "a"
lead_lag(df$a, -4) #[1] "b" NA  NA  NA  NA
  • This is probably the closest thing to what I was looking for, so I'll give it my checkmark. – iod Jun 28 '19 at 13:08