R: add a dplyr group label as a number

Question

I can not get my head around this must be simple task. How to get a group label as a consecutive number.

library(dplyr)
set.seed(1)
df <- data.frame(id = sample(c('a','b'), 20, T),
                 name = sample(c('N1', 'N2', 'N3'), 20, T),
                 val = runif(20)) %>%
  group_by(id) %>% 
  arrange(id, name)

What I want is a label group_no that indicates the number of categories of the variable name within each id dplyr group. I can not find a solution in the dplyr package itself. Something like this:

# A tibble: 20 x 4
# Groups:   id [2]
   id    name     val group_no
   <fct> <fct>  <dbl>     <int>
 1 a     N1    0.647          1
 2 a     N1    0.530          1
 3 a     N1    0.245          1
 4 a     N2    0.693          2
 5 a     N2    0.478          2
 6 a     N2    0.861          2
 7 a     N3    0.821          3
 8 a     N3    0.0995         3
 9 a     N3    0.662          3
10 b     N1    0.553          1
11 b     N1    0.0233         1
12 b     N1    0.519          1
13 b     N2    0.783          2
14 b     N2    0.789          2
15 b     N2    0.477          2
16 b     N2    0.438          2
17 b     N2    0.407          2
18 b     N3    0.732          3
19 b     N3    0.0707         3
20 b     N3    0.316          3

Note, that the values of name could be anything and certainly are not normally suffixed by a number as in the example (otherwise I could do sub("^N", "", df$name).

I am looking for something a little different than the 1:n() solution in SO posts such as here.

`group_indices(df, name)`, but unfortunately [`group_indices` doesn't work inside `mutate`](https://github.com/tidyverse/dplyr/issues/1185)...You find some workarounds in the post though. — Henrik, Apr 30 '18 at 11:08

score 8 · Accepted Answer · answered Apr 30 '18 at 11:00

8

I think in this case something as simple as :

df %>%
  mutate(group_no = as.integer(name))

will work

# A tibble: 20 x 4
# Groups:   id [2]
   id    name     val group_no
   <fct> <fct>  <dbl>    <int>
 1 a     N1    0.647         1
 2 a     N1    0.530         1
 3 a     N1    0.245         1
 4 a     N2    0.693         2
 5 a     N2    0.478         2
 6 a     N2    0.861         2
 7 a     N3    0.821         3
 8 a     N3    0.0995        3
 9 a     N3    0.662         3
10 b     N1    0.553         1
11 b     N1    0.0233        1
12 b     N1    0.519         1
13 b     N2    0.783         2
14 b     N2    0.789         2
15 b     N2    0.477         2
16 b     N2    0.438         2
17 b     N2    0.407         2
18 b     N3    0.732         3
19 b     N3    0.0707        3
20 b     N3    0.316         3

answered Apr 30 '18 at 11:00

denis

5,580
1
13
40

1

That is such a nice trick! – hpesoj626 Apr 30 '18 at 11:02
thanks! It is actually a recurrent mistake i make :) – denis Apr 30 '18 at 11:02
Nice - but only works with factors (as in my example). I tend not to want factors. – user3375672 Apr 30 '18 at 11:05
1

@user3375672 indeed. But you could do `mutate(group_no = as.integer(as.factor(name)))` in that case, it would still work – denis Apr 30 '18 at 11:06
Sure - good point. I will accept this for its simplicity though the other suggestions work just as fine. – user3375672 Apr 30 '18 at 11:07

akrun · Answer 2 · 2018-04-30T11:04:49.210

We can do

df %>%
   group_by(id) %>%
   mutate(group_no = cumsum(c(TRUE, name[-1] != name[-n()])))

Or with match

df %>% 
  group_by(id) %>%
  mutate(group_no = match(name, unique(name)))
# A tibble: 20 x 4
# Groups:   id [2]
#   id    name     val group_no
#   <fct> <fct>  <dbl>    <int>
# 1 a     N1    0.647         1
# 2 a     N1    0.530         1
# 3 a     N1    0.245         1
# 4 a     N2    0.693         2
# 5 a     N2    0.478         2
# 6 a     N2    0.861         2
# 7 a     N3    0.821         3
# 8 a     N3    0.0995        3
# 9 a     N3    0.662         3
#10 b     N1    0.553         1
#11 b     N1    0.0233        1
#12 b     N1    0.519         1
#13 b     N2    0.783         2
#14 b     N2    0.789         2
#15 b     N2    0.477         2
#16 b     N2    0.438         2
#17 b     N2    0.407         2
#18 b     N3    0.732         3
#19 b     N3    0.0707        3
#20 b     N3    0.316         3

hpesoj626 · Answer 3 · 2018-04-30T11:10:46.323

2

Here is a solution that uses left_join.

df %>%
  left_join(df %>%
  group_by(id, name) %>%
  summarise(group_no = row_number()))

edited Apr 30 '18 at 11:10

answered Apr 30 '18 at 11:01

hpesoj626

3,529
1
17
25

R: add a dplyr group label as a number

3 Answers3

Linked

Related