23

I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle but don't know how to use it here).

It's easier to look at the data set and explain what I'm after.

Here's the data view:

$greg
 [1]  7  8  9 10 11 20 21 22 23 24 30 31 32 33 49

$researcher
[1] 42 43 44 45 46 47 48

$sally
 [1] 25 26 27 28 29 37 38 39 40 41

$sam
 [1]  1  2  3  4  5  6 16 17 18 19 34 35 36

$teacher
[1] 12 13 14 15

Desired output:

$greg
 [1]  7:11, 20:24, 30:33, 49

$researcher
 [1] 42:48

$sally
 [1] 25:29, 37:41

$sam
 [1]  1:6, 16:19 34:36

$teacher
 [1] 12:15

Use base packages how can I replace continuous span with a colon between highest and lowest and commas in between non the non continuous parts? Note that the data goes from a list of integer vectors to a list of character vectors.

MWE data:

z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L, 
    23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L, 
    26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L, 
    3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg", 
    "researcher", "sally", "sam", "teacher"))
Henrik
  • 65,555
  • 14
  • 143
  • 159
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

6 Answers6

13

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
  diffs <- c(1, diff(x))
  start_indexes <- c(1, which(diffs > 1))
  end_indexes <- c(start_indexes - 1, length(x))
  coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
  paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"
Marius
  • 58,213
  • 16
  • 107
  • 105
  • This one I liked the most because I could understand everything you did. I made one small tweak to get `49:49` as `49` but that was the easy part. Thank you. – Tyler Rinker Feb 14 '13 at 06:13
9

Using IRanges:

require(IRanges)
lapply(z, function(x) {
    t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
    apply(t, 1, function(x) paste(unique(x), collapse=":"))
})

# $greg
# [1] "7:11"  "20:24" "30:33" "49"   
# 
# $researcher
# [1] "42:48"
# 
# $sally
# [1] "25:29" "37:41"
# 
# $sam
# [1] "1:6"   "16:19" "34:36"
# 
# $teacher
# [1] "12:15"
qwr
  • 9,525
  • 5
  • 58
  • 102
Arun
  • 116,683
  • 26
  • 284
  • 387
6

Here is an attempt using diff and tapply returning a character vector

runs <- lapply(z, function(x) {
  z <- which(diff(x)!=1); 
  results <- x[sort(unique(c(1,length(x), z,z+1)))]
  lr <- length(results)
  collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
  as.vector(tapply(results, collapse, paste, collapse = ':'))
  })

runs
$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
mnel
  • 113,303
  • 27
  • 265
  • 254
5

I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unname(sapply(difflist, function(x){
    if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
  }))
}

lapply(z, findIntRuns)

Which produces:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
sebastian-c
  • 15,057
  • 3
  • 47
  • 93
5

Another short solution with lapply and tapply:

lapply(z, function(x)
  unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y) 
    paste(unique(range(y)), collapse = ":")
  ))
)

The result:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
3

Late to the party, but here's a deparse based one-liner:

lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"
James
  • 65,548
  • 14
  • 155
  • 193