4

I want to find the unique sequences in my vector. A sequence is a series of identical values. If a sequence repeats, it counts as two sequences, as long as there is another sequence in between. A sequence can have a length of one value.

So that if my function is called findSequences(), it would work like this:

my_vector = c('a', 'a', 'b', 'a', 'c', 'c', 'b')

find_Sequences(my_vector)

> 'a', 'b', 'a', 'c', 'b'

unique() and distinct() don't do this.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
petyar
  • 491
  • 3
  • 10

4 Answers4

8

You can use rle.

rle(my_vector)$values
#[1] "a" "b" "a" "c" "b"
GKi
  • 37,245
  • 2
  • 26
  • 48
4

You can use comparisons with the preceding item:

my_vector[c(TRUE, my_vector[-1] != my_vector[-length(my_vector)])]

It should be better than rle as it is doing the same with less code.

Clemsang
  • 5,053
  • 3
  • 23
  • 41
2

You can use the run length encoding rle function:

rle(c('a', 'a', 'b', 'a', 'c', 'c', 'b'))
Run Length Encoding
  lengths: int [1:5] 2 1 1 2 1
  values : chr [1:5] "a" "b" "a" "c" "b"

The values field tells you what you need.

user2474226
  • 1,472
  • 1
  • 9
  • 9
2

We can also use data.table::rleid and duplicated to get unique sequences.

my_vector[!duplicated(data.table::rleid(my_vector))] 
#[1] "a" "b" "a" "c" "b"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213