How do I find unique sequences in a vector?

Question

I want to find the unique sequences in my vector. A sequence is a series of identical values. If a sequence repeats, it counts as two sequences, as long as there is another sequence in between. A sequence can have a length of one value.

So that if my function is called findSequences(), it would work like this:

my_vector = c('a', 'a', 'b', 'a', 'c', 'c', 'b')

find_Sequences(my_vector)

> 'a', 'b', 'a', 'c', 'b'

unique() and distinct() don't do this.

score 8 · Accepted Answer · answered Dec 18 '19 at 14:08

8

You can use rle.

rle(my_vector)$values
#[1] "a" "b" "a" "c" "b"

answered Dec 18 '19 at 14:08

GKi

37,245
2
26
48

score 4 · Answer 2 · answered Dec 18 '19 at 14:08

4

You can use comparisons with the preceding item:

my_vector[c(TRUE, my_vector[-1] != my_vector[-length(my_vector)])]

It should be better than rle as it is doing the same with less code.

answered Dec 18 '19 at 14:08

Clemsang

5,053
3
23
41

score 2 · Answer 3 · answered Dec 18 '19 at 14:08

2

You can use the run length encoding rle function:

rle(c('a', 'a', 'b', 'a', 'c', 'c', 'b'))
Run Length Encoding
  lengths: int [1:5] 2 1 1 2 1
  values : chr [1:5] "a" "b" "a" "c" "b"

The values field tells you what you need.

answered Dec 18 '19 at 14:08

user2474226

1,472
1
9
9

score 2 · Answer 4 · answered Dec 18 '19 at 14:13

2

We can also use data.table::rleid and duplicated to get unique sequences.

my_vector[!duplicated(data.table::rleid(my_vector))] 
#[1] "a" "b" "a" "c" "b"

answered Dec 18 '19 at 14:13

Ronak Shah

377,200
20
156
213

How do I find unique sequences in a vector?

4 Answers4