3

I have a problem which seems quite simple but I have not been able to find a nice way of solving it.

If I have a vector of numbers, here representing years,

for example c(2000,2001,2002,2003, 2005, 2007,2008,2009,2010)

I would like it to return a string but not with all the numbers since it would be quite long but with intervals where it is possible so a string that would return “2000-2003, 2005, 2007-2010”.

Does anyone have an easy way of doing this in general?

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
Katrine
  • 51
  • 3
  • Read the help of the `cut` function using `?cut` – Marco Sandri Sep 23 '17 at 13:36
  • 1
    Related: [Continuous integer runs](https://stackoverflow.com/questions/14868406/continuous-integer-runs/14868742#14868742); [R - collapse consecutive or running numbers](https://stackoverflow.com/questions/16911773/r-collapse-consecutive-or-running-numbers) – Henrik Sep 23 '17 at 14:10
  • Don't forget to accept the best answer by clicking on the grey check mark under the downvote button. – acylam Oct 02 '17 at 18:50

2 Answers2

2

Here is one way to do it.

nums <- c(2000,2001,2002,2003, 2005, 2007,2008,2009,2010)

numRanges <- function(nums){
  nums <- sort(nums) #sort in case they are in random order!
  paste(tapply(nums, 
               cumsum(c(1, diff(nums)!=1)), #grouping indicator
               function(x) paste(min(x), #first number of each group
                                 ifelse(length(x)==1, "", max(x)), #last number if required
                                 sep = ifelse(length(x)==1, "", "-"))),
        collapse=", ") #paste the above together into a single string
}

numRanges(nums)
"2000-2003, 2005, 2007-2010" 
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • 1
    Instead of `max` and `min`, you could use `range`: `toString(sapply(split(nums, cumsum(c(1, diff(nums) != 1))), function(x) ifelse(length(x) > 2, paste(range(x), collapse = "-"), x)))` – d.b Sep 23 '17 at 14:56
0

You can also use seqle from cgwtools, which is an extension to Base R rle:

year = c(2000,2001,2002,2003, 2005, 2007,2008,2009,2010)

library(dplyr)
library(cgwtools)

seqle(year) %>%
  {paste0(.$values, "-", .$values+(.$lengths-1))} %>%
  toString() %>%
  gsub("(\\d+)[-]\\1", "\\1", .)

# [1] "2000-2003, 2005, 2007-2010"

seqle encodes linear sequence of year and outputs lengths and values, which allows me to add them together fairly easily. gsub replaces 2005-2005 with 2005 as desired.

> seqle(year)
Run Length Encoding
  lengths: int [1:3] 4 1 4
  values : num [1:3] 2000 2005 2007
acylam
  • 18,231
  • 5
  • 36
  • 45