R: group dates that are next to each other

Question

I have a sequence of dates (years) that is irregular.

Specifically, year 2004 is followed by 2005, 2006 is missing, 2007 is present, followed by 2008, then sequence is missing years until 2014.

# data input
df_in <- 
  data.frame(seq = c(2004L, 2005L, 2007L, 2008L, 2014L, 2015L, 2016L))

# desired result
df_out <- 
  data.frame(df_in, grp = c(1L, 1L, 2L, 2L, 3L, 3L, 3L))

   seq grp
1 2004   1
2 2005   1
3 2007   2
4 2008   2
5 2014   3
6 2015   3
7 2016   3

I would like to find a way to generate groups of years that are next to each other. So, group 1 would contain years 2004 and 2005, group 2 years 2007 and 2008, and group 3 years from 2014 to 2016.

Any help would be appreciated.

score 1 · Accepted Answer · answered Feb 11 '20 at 15:41

How about:

df_in$group = 1 + c(0, cumsum(ifelse(diff(df_in$seq) > 1, 1, 0)))

The idea here is that diff calculates the lagged difference. When it's more than 1, we add one to the group. cumsum calculates the cumulative sum of those times we've encountered a gap, aka a new group. The c(0, is there because the output of diff is one shorter than our data, and we need a value for the first element. Finally, the 1 + is just for optics, so the first group is 1 instead of 0.

> df_in$group 
[1] 1 1 2 2 3 3 3

score 1 · Answer 2 · answered Feb 11 '20 at 15:46

1

cumsum(c(1, diff(df_in$seq)) != 1) + 1
[1] 1 1 2 2 3 3 3

answered Feb 11 '20 at 15:46

s_baldur

29,441
4
36
69

score 0 · Answer 3 · answered Feb 11 '20 at 15:37

This is the best I could come up with. But I'd be great if someone else has a more elegant solution:

df_in <- data.frame(seq = c(2004L, 2005L, 2007L, 2008L, 2014L, 2015L, 2016L))

Define maximal distance between elements within a group:

max_range_within_group <- 1

Calculate existing distances:

diffs <- df_in$seq[-1] - df_in$seq[-length(df_in$seq)]

Iterate trough distances and check if they are within 'allowed' distance or increase grp by 1:

grp <- 1
for (diff in diffs) {
  nextGrp <- if (diff <= max_range_within_group) {
    grp[length(grp)]
  } else {
    grp[length(grp)] + 1
  }
  grp <- c(grp, nextGrp)
}

Bind grp to data.frame:

df_in$grp <- grp

This returns:

R: group dates that are next to each other

3 Answers3

Linked