Deconstruct vector by value

Question

Given a partially sorted vector:

A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1)

The aim is to to deconstruct this vector into a table that shows the distinct value and the range for those:

 start end value
 1     5       1
 6     10      0
 11    15      2
 16    20     -1

I tried using the diff function but cannot seem to find a good way to cluster the values into the required ranges.

would the similar values be always together? What if you have a vector `A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1, 1, 1, 1)` ? — Ronak Shah, Nov 24 '18 at 15:00
The vector is in reality much longer and the distinct values repeat - so in that case I'd need another column in the table saying 21 - 23 - 1 (so it's about the sequence, not unique values). — Maximilian Niroomand, Nov 24 '18 at 15:09
Related: [How can I find the indices of a continuous string of numbers?](https://stackoverflow.com/questions/40368551/how-can-i-find-the-indices-of-a-continuous-string-of-numbers); [Find start and end positions/indices of runs/consecutive values](https://stackoverflow.com/questions/43875716/find-start-and-end-positions-indices-of-runs-consecutive-values) — Henrik, Nov 24 '18 at 22:47

AkselA · Accepted Answer · 2018-11-24T22:27:42.603

Using rle() (Run-length encoding)

A <- c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2,
      -1, -1, -1, -1, -1, 1, 1, 1, 0, 0, 0, 0)
rled <- as.data.frame(unclass(rle(A)))

rled$end <- cumsum(rled$lengths)
rled$start <- rled$end - rled$lengths + 1

rled[, c("start", "end", "values")]


#   start end values
# 1     1   5      1
# 2     6  10      0
# 3    11  15      2
# 4    16  20     -1
# 5    21  23      1
# 6    24  27      0

Ronak Shah · Answer 2 · 2018-11-24T16:03:18.310

We can use rleid from data.table. We loop over each unique number and find it's first and last occurrence in the original sequence and convert it into data.frame.

library(data.table) 

indx <- rleid(A)
new_dat <- data.frame(t(sapply(unique(indx), function(x) {
                           val <- which(indx == x)
                          c(start = min(val), stop = max(val))
})))

transform(new_dat, value = A[new_dat$start])


#  X1 X2 value
#1  1  5     1
#2  6 10     0
#3 11 15     2
#4 16 20    -1

When the numbers are repeating

A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1, 1, 1, 1)

indx <- rleid(A)
new_dat <- data.frame(t(sapply(unique(indx), function(x) {
                          val <- which(indx == x)
                         c(start = min(val), stop = max(val))
})))

transform(new_dat, value = A[new_dat$start])


#  start stop value
#1     1    5     1
#2     6   10     0
#3    11   15     2
#4    16   20    -1
#5    21   23     1

A better concise data.table way suggested by @Henrik

library(data.table)
data.table(A)[ , .(from = .I[1], to = .I[.N], val = A[1]), by = rleid(A)][,-1]


#   from to val
#1:    1  5   1
#2:    6 10   0
#3:   11 15   2
#4:   16 20  -1
#5:   21 23   1

Deconstruct vector by value

2 Answers2