0

Given a partially sorted vector:

A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1)

The aim is to to deconstruct this vector into a table that shows the distinct value and the range for those:

 start end value
 1     5       1
 6     10      0
 11    15      2
 16    20     -1

I tried using the diff function but cannot seem to find a good way to cluster the values into the required ranges.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • would the similar values be always together? What if you have a vector `A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1, 1, 1, 1)` ? – Ronak Shah Nov 24 '18 at 15:00
  • The vector is in reality much longer and the distinct values repeat - so in that case I'd need another column in the table saying 21 - 23 - 1 (so it's about the sequence, not unique values). – Maximilian Niroomand Nov 24 '18 at 15:09
  • Related: [How can I find the indices of a continuous string of numbers?](https://stackoverflow.com/questions/40368551/how-can-i-find-the-indices-of-a-continuous-string-of-numbers); [Find start and end positions/indices of runs/consecutive values](https://stackoverflow.com/questions/43875716/find-start-and-end-positions-indices-of-runs-consecutive-values) – Henrik Nov 24 '18 at 22:47

2 Answers2

3

Using rle() (Run-length encoding)

A <- c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2,
      -1, -1, -1, -1, -1, 1, 1, 1, 0, 0, 0, 0)
rled <- as.data.frame(unclass(rle(A)))

rled$end <- cumsum(rled$lengths)
rled$start <- rled$end - rled$lengths + 1

rled[, c("start", "end", "values")]


#   start end values
# 1     1   5      1
# 2     6  10      0
# 3    11  15      2
# 4    16  20     -1
# 5    21  23      1
# 6    24  27      0
AkselA
  • 8,153
  • 2
  • 21
  • 34
2

We can use rleid from data.table. We loop over each unique number and find it's first and last occurrence in the original sequence and convert it into data.frame.

library(data.table) 

indx <- rleid(A)
new_dat <- data.frame(t(sapply(unique(indx), function(x) {
                           val <- which(indx == x)
                          c(start = min(val), stop = max(val))
})))

transform(new_dat, value = A[new_dat$start])


#  X1 X2 value
#1  1  5     1
#2  6 10     0
#3 11 15     2
#4 16 20    -1

When the numbers are repeating

A <- c(1,1,1,1,1,0,0,0,0,0,2,2,2,2,2,-1,-1,-1,-1,-1, 1, 1, 1)

indx <- rleid(A)
new_dat <- data.frame(t(sapply(unique(indx), function(x) {
                          val <- which(indx == x)
                         c(start = min(val), stop = max(val))
})))

transform(new_dat, value = A[new_dat$start])


#  start stop value
#1     1    5     1
#2     6   10     0
#3    11   15     2
#4    16   20    -1
#5    21   23     1

A better concise data.table way suggested by @Henrik

library(data.table)
data.table(A)[ , .(from = .I[1], to = .I[.N], val = A[1]), by = rleid(A)][,-1]


#   from to val
#1:    1  5   1
#2:    6 10   0
#3:   11 15   2
#4:   16 20  -1
#5:   21 23   1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213