2

How can I create a "group" vector that identifies sequences of same values in another vector.

From this

x <- c(0,1,0,0,1,0,1)

I want to create this

outcome <- c(1,2,3,3,4,5,6)

[1] 0 1 0 0 1 0 1
[1] 1 2 3 3 4 5 6

So, whenever there is a new sequence of the same values there is a new group number (or can be something other than a number as well).


I would actually know ways to get there, but they are all hideous. The best I can come up with is

comparison <- x != lag(x)
cumsum(replace_na(comparison, TRUE))

but like I said - hideous. There must be a better way and I hope someone knows it.

Georgery
  • 7,643
  • 1
  • 19
  • 52
  • 1
    Possible duplicate: [*How to create a consecutive index based on a grouping variable in a dataframe*](https://stackoverflow.com/q/6112803/2204410) – Jaap Feb 16 '20 at 18:41
  • @Jaap I do not get it. Why do you close the question? The "duplicate" you linked does **not** answer this question here. Please, read more carefully before you close a question. – Georgery Feb 17 '20 at 08:45

3 Answers3

4

We can use rleid from data.table

library(data.table)
rleid(x)
#[1] 1 2 3 3 4 5 6

Or in base R with rle

with(rle(x), rep(seq_along(values), lengths))
#[1] 1 2 3 3 4 5 6

Or if we use the similar approach from OP

1 + cumsum(x != dplyr::lag(x, default = first(x)))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

If x is always only 0s and 1s, another option is

cumsum(c(1, (x[-1] + head(x, -1)) %% 2))

[1] 1 2 3 3 4 5 6
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
0

a tidyverse version that does a condition, replaces the NA and sums cumulatively:

library(tidyverse)

if_else(outcome == lag(outcome), 0, 1) %>% 
  replace_na(1) %>% 
  cumsum()

[1] 1 2 3 3 4 5 6
nycrefugee
  • 1,629
  • 1
  • 10
  • 23