Create group names for consecutive values

Question

Looks like an easy task, can't figure out a simpler way. I have an x vector below, and need to create group names for consecutive values. My attempt was using rle, better ideas?

# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)

# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"

why non using paste directly ?`paste0('groupe_', c(1,1,1,2,2,2,3,2,2,1,1))` — Mamoun Benghezal, Jun 14 '16 at 10:18
because the last two groups will be 2 and 1 instead of 4 and 5 if paste directly — Sotos, Jun 14 '16 at 10:18
@MamounBenghezal please check the expected output, first `1` is a `Group_1`, and last `1` is a `Group_5` — zx8754, Jun 14 '16 at 10:19
Nice attempt. A key line in the source code of `rle` makes use of `diff` as @Roland did below. — Joseph Wood, Jun 14 '16 at 13:12
But.. having done that, how do you map these `Group_x` names to the actual values & run lengths? That is, what's the point of this exercise? — Carl Witthoft, Jun 14 '16 at 14:07
@CarlWitthoft names are in the same order as the values, so direct map, i.e.: `names(x) <- myGroups`. My actual data is data.frame, so I can apply the same and create a `Group` column for aggregate functions down the line. — zx8754, Jun 14 '16 at 14:12

score 11 · Answer 1 · edited Oct 26 '21 at 11:45

11

Using rleid from data.table,

library(data.table)

rleid(x, prefix = "Group_")
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

edited Oct 26 '21 at 11:45

zx8754

52,746
12
114
209

answered Jun 14 '16 at 10:25

Sotos

51,121
6
32
66

score 10 · Accepted Answer · answered Jun 14 '16 at 10:32

10

Using diff and cumsum :

paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

(If your values are floating point values, you might have to avoid != and use a tolerance instead.)

answered Jun 14 '16 at 10:32

Roland

127,288
10
191
288

If they might not be numeric - `paste0("Group_", cumsum(c(TRUE, head(x,-1)!=tail(x,-1))))` – thelatemail Jun 14 '16 at 10:33
My numbers have no floating points, so `!=` should be OK, but what do you mean by tolerance? – zx8754 Jun 14 '16 at 10:39
2

`abs(diff(x)) < tol` with `tol` based on `help(".Machine")`. – Roland Jun 14 '16 at 10:40
3

Nice - I'm guessing this is faster than `rle(x)` and processing the output from that. OTOH, I would want to know how to map the group names to the runs, in which case might as well use `rle(x)$lengths` . – Carl Witthoft Jun 14 '16 at 14:08

score 3 · Answer 3 · 2016-06-14T21:57:42.237

3

Using cumsum but not relying on the data being numeric:

paste0("Group_", 1 + c(0, cumsum(x[-length(x)] != x[-1])))


[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

edited Jun 14 '16 at 21:57

answered Jun 14 '16 at 13:04

score 2 · Answer 4 · answered Jul 26 '19 at 01:32

group() from groupdata2 can create groups from a list of group starting points, using the l_starts method. By setting n to auto, it automatically finds group starts:

x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")

## # A tibble: 11 x 2
## # Groups:   .groups [5]
##     data .groups
##    <dbl> <fct>  
##  1     1 1      
##  2     1 1      
##  3     1 1      
##  4     2 2      
##  5     2 2      
##  6     2 2      
##  7     3 3      
##  8     2 4      
##  9     2 4      
## 10     1 5      
## 11     1 5

There's also the differs_from_previous() function which finds values, or indices of values, that differ from the previous value by some threshold(s).

# The values to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both")
## [1] 2 3 2 1

# The indices to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both",
                      return_index = TRUE)
## [1] 4 7 8 10

Create group names for consecutive values

4 Answers4

Linked

Related