Collapse continuous integer runs to strings of ranges

Question

I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle but don't know how to use it here).

It's easier to look at the data set and explain what I'm after.

Here's the data view:

$greg
 [1]  7  8  9 10 11 20 21 22 23 24 30 31 32 33 49

$researcher
[1] 42 43 44 45 46 47 48

$sally
 [1] 25 26 27 28 29 37 38 39 40 41

$sam
 [1]  1  2  3  4  5  6 16 17 18 19 34 35 36

$teacher
[1] 12 13 14 15

Desired output:

$greg
 [1]  7:11, 20:24, 30:33, 49

$researcher
 [1] 42:48

$sally
 [1] 25:29, 37:41

$sam
 [1]  1:6, 16:19 34:36

$teacher
 [1] 12:15

Use base packages how can I replace continuous span with a colon between highest and lowest and commas in between non the non continuous parts? Note that the data goes from a list of integer vectors to a list of character vectors.

MWE data:

z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L, 
    23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L, 
    26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L, 
    3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg", 
    "researcher", "sally", "sam", "teacher"))

Your question is a bit similar to this one: http://stackoverflow.com/q/7077710/602276 — Andrie, Feb 14 '13 at 06:03

Marius · Accepted Answer · 2013-02-14T05:58:03.617

13

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
  diffs <- c(1, diff(x))
  start_indexes <- c(1, which(diffs > 1))
  end_indexes <- c(start_indexes - 1, length(x))
  coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
  paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

edited Feb 14 '13 at 05:58

answered Feb 14 '13 at 05:52

Marius

58,213
16
107
105

This one I liked the most because I could understand everything you did. I made one small tweak to get `49:49` as `49` but that was the easy part. Thank you. – Tyler Rinker Feb 14 '13 at 06:13

score 9 · Answer 2 · edited Dec 31 '22 at 02:23

9

Using IRanges:

require(IRanges)
lapply(z, function(x) {
    t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
    apply(t, 1, function(x) paste(unique(x), collapse=":"))
})

# $greg
# [1] "7:11"  "20:24" "30:33" "49"   
# 
# $researcher
# [1] "42:48"
# 
# $sally
# [1] "25:29" "37:41"
# 
# $sam
# [1] "1:6"   "16:19" "34:36"
# 
# $teacher
# [1] "12:15"

edited Dec 31 '22 at 02:23

qwr

9,525
5
58
102

answered Feb 14 '13 at 06:01

Arun

116,683
26
284
387

Works very well. Not in base but useful for future searchers. Thank you. +1 – Tyler Rinker Feb 14 '13 at 06:12
1

Sure, anything related to intervals, it is better to use package that implements `interval trees`. – Arun Feb 14 '13 at 06:16
Yeah this was the first time I've seen `IRanges` – Tyler Rinker Feb 14 '13 at 06:17

score 6 · Answer 3 · answered Feb 14 '13 at 06:07

6

Here is an attempt using diff and tapply returning a character vector

runs <- lapply(z, function(x) {
  z <- which(diff(x)!=1); 
  results <- x[sort(unique(c(1,length(x), z,z+1)))]
  lr <- length(results)
  collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
  as.vector(tapply(results, collapse, paste, collapse = ':'))
  })

runs
$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

answered Feb 14 '13 at 06:07

mnel

113,303
27
265
254

When I think I'm getting good at R I look at code like this and realize I have a lot to learn +1 – Tyler Rinker Feb 14 '13 at 06:12
I'm not quite sure that is a compliment :). – mnel Feb 14 '13 at 06:15
No it is. There were some combinations of functions I wouldn't have thought to put together :-) I liked the creativity. – Tyler Rinker Feb 14 '13 at 06:15

score 5 · Answer 4 · answered Feb 14 '13 at 06:06

I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unname(sapply(difflist, function(x){
    if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
  }))
}

lapply(z, findIntRuns)

Which produces:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

score 5 · Answer 5 · answered Feb 14 '13 at 07:55

Another short solution with lapply and tapply:

lapply(z, function(x)
  unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y) 
    paste(unique(range(y)), collapse = ":")
  ))
)

The result:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"

score 3 · Answer 6 · answered Mar 29 '13 at 11:13

3

Late to the party, but here's a deparse based one-liner:

lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

answered Mar 29 '13 at 11:13

James

65,548
14
155
193

Nice approach +1 definitely late to the party ;) – Tyler Rinker Mar 29 '13 at 15:19

Collapse continuous integer runs to strings of ranges

6 Answers6

Linked

Related