5

Consider the following comma-separated string of numbers:

s <- "1,2,3,4,8,9,14,15,16,19"
s
# [1] "1,2,3,4,8,9,14,15,16,19"

Is it possible to collapse runs of consecutive numbers to its corresponding ranges, e.g. the run 1,2,3,4 above would be collapsed to the range 1-4. The desired result looks like the following string:

s
# [1] "1-4,8,9,14-16,19"
Henrik
  • 65,555
  • 14
  • 143
  • 159
user969113
  • 2,349
  • 10
  • 44
  • 51

4 Answers4

8

I took some heavy inspiration from the answers in this question.

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unlist(lapply(difflist, function(x){
    if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
  }), use.names=FALSE)
}

s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))

paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"

EDIT: Multiple solutions: benchmarking time!

Unit: microseconds
   expr     min      lq   median       uq      max neval
 spee() 277.708 295.517 301.5540 311.5150 1612.207  1000
  seb() 294.611 313.025 321.1750 332.6450 1709.103  1000
 marc() 672.835 707.549 722.0375 744.5255 2154.942  1000

@speendo's solution is the fastest at the moment, but none of these have been optimised yet.

Community
  • 1
  • 1
sebastian-c
  • 15,057
  • 3
  • 47
  • 93
  • Will this run any faster if you use my infamous `seqle` variant of `rle` ? http://stackoverflow.com/questions/8466807/rle-like-function-that-catches-run-of-adjacent-integers/8467663#8467663 – Carl Witthoft Jun 04 '13 at 11:48
3

I was too slow... but here's another solution.

It uses less R-specific functions so it could be ported to other languages (on the other hand maybe it's less elegant)

s <- "1,2,3,4,8,9,14,15,16,19"

collapseConsecutive <- function(s){
  x <- as.numeric(unlist(strsplit(s, ",")))

  x_0 <- x[1]
  out <- toString(x[1])
  hasDash <- FALSE

  for(i in 2:length(x)) {
    x_1 <- x[i]
    x_2 <- x[i+1]

    if((x_0 + 1) == x_1 && !is.na(x_2) && (x_1 + 1) == x_2) {
      if(!hasDash) {
        out <- c(out, "-")
        hasDash <- TRUE
      }
    } else {
      if(hasDash) {
        hasDash <- FALSE
      } else {
        out <- c(out, ",")
      }
      out <- c(out, x_1)
      hasDash <- FALSE
    }
    x_0 <- x_1
  }
  outString <- paste(out, collapse="")
  outString
}

collapseConsecutive(s)
# [1] "1-4,8,9,14-16,19"
speendo
  • 13,045
  • 22
  • 71
  • 107
1

Another fairly compact option

in.seq <- function(x) {
    # returns TRUE for elments within ascending sequences
    (c(diff(x, 1), NA) == 1 & c(NA, diff(x,2), NA) == 2)
    }

contractSeqs <-  function(x) {
    # returns string formatted with contracted sequences
    x[in.seq(x)] <- ""
    gsub(",{2,}", "-", paste(x, collapse=","), perl=TRUE)
    }

s <- "1,2,3,4,8,9,14,15,16,19"

s1 <- as.numeric(unlist(strsplit(s, ","))) # as earlier answers

# assumes: numeric vector, length > 2, positive integers, ascending sequences

contractSeqs(s1)
# [1] "1-4,8,9,14-16,19"

I also wrote a bells & whistles version that can handle both numeric and string input including named objects, descending sequences and alternative punctuation, as well as performing error checking and reporting. If anyone is interested, I can add this to my answer.

IanRiley
  • 233
  • 1
  • 10
0

Here's a function that should do what you want:

conseq <- function(s){
s <- as.numeric(unlist(strsplit(s, ",")))
dif <- s[seq(length(s))][-1] - s[seq(length(s)-1)]
new <- !c(0, dif == 1)
cs <- cumsum(new)
res <- vector(mode="list", max(cs))
for(i in seq(res)){
    s.i <- s[which(cs == i)]    
    if(length(s.i) > 2){
        res[[i]] <- paste(min(s.i), max(s.i), sep="-")
    } else {
        res[[i]] <- as.character(s.i)
    }
}  
paste(unlist(res), collapse=",")
}

Example

> s <- "1,2,3,4,8,9,14,15,16,19"
> conseq(s)
[1] "1-4,8,9,14-16,19"
Marc in the box
  • 11,769
  • 4
  • 47
  • 97