2

I can not get my head around this must be simple task. How to get a group label as a consecutive number.

library(dplyr)
set.seed(1)
df <- data.frame(id = sample(c('a','b'), 20, T),
                 name = sample(c('N1', 'N2', 'N3'), 20, T),
                 val = runif(20)) %>%
  group_by(id) %>% 
  arrange(id, name)

What I want is a label group_no that indicates the number of categories of the variable name within each id dplyr group. I can not find a solution in the dplyr package itself. Something like this:

# A tibble: 20 x 4
# Groups:   id [2]
   id    name     val group_no
   <fct> <fct>  <dbl>     <int>
 1 a     N1    0.647          1
 2 a     N1    0.530          1
 3 a     N1    0.245          1
 4 a     N2    0.693          2
 5 a     N2    0.478          2
 6 a     N2    0.861          2
 7 a     N3    0.821          3
 8 a     N3    0.0995         3
 9 a     N3    0.662          3
10 b     N1    0.553          1
11 b     N1    0.0233         1
12 b     N1    0.519          1
13 b     N2    0.783          2
14 b     N2    0.789          2
15 b     N2    0.477          2
16 b     N2    0.438          2
17 b     N2    0.407          2
18 b     N3    0.732          3
19 b     N3    0.0707         3
20 b     N3    0.316          3

Note, that the values of name could be anything and certainly are not normally suffixed by a number as in the example (otherwise I could do sub("^N", "", df$name).

I am looking for something a little different than the 1:n() solution in SO posts such as here.

user3375672
  • 3,728
  • 9
  • 41
  • 70
  • 1
    `group_indices(df, name)`, but unfortunately [`group_indices` doesn't work inside `mutate`](https://github.com/tidyverse/dplyr/issues/1185)...You find some workarounds in the post though. – Henrik Apr 30 '18 at 11:08

3 Answers3

8

I think in this case something as simple as :

df %>%
  mutate(group_no = as.integer(name))

will work

# A tibble: 20 x 4
# Groups:   id [2]
   id    name     val group_no
   <fct> <fct>  <dbl>    <int>
 1 a     N1    0.647         1
 2 a     N1    0.530         1
 3 a     N1    0.245         1
 4 a     N2    0.693         2
 5 a     N2    0.478         2
 6 a     N2    0.861         2
 7 a     N3    0.821         3
 8 a     N3    0.0995        3
 9 a     N3    0.662         3
10 b     N1    0.553         1
11 b     N1    0.0233        1
12 b     N1    0.519         1
13 b     N2    0.783         2
14 b     N2    0.789         2
15 b     N2    0.477         2
16 b     N2    0.438         2
17 b     N2    0.407         2
18 b     N3    0.732         3
19 b     N3    0.0707        3
20 b     N3    0.316         3
denis
  • 5,580
  • 1
  • 13
  • 40
5

We can do

df %>%
   group_by(id) %>%
   mutate(group_no = cumsum(c(TRUE, name[-1] != name[-n()])))

Or with match

df %>% 
  group_by(id) %>%
  mutate(group_no = match(name, unique(name)))
# A tibble: 20 x 4
# Groups:   id [2]
#   id    name     val group_no
#   <fct> <fct>  <dbl>    <int>
# 1 a     N1    0.647         1
# 2 a     N1    0.530         1
# 3 a     N1    0.245         1
# 4 a     N2    0.693         2
# 5 a     N2    0.478         2
# 6 a     N2    0.861         2
# 7 a     N3    0.821         3
# 8 a     N3    0.0995        3
# 9 a     N3    0.662         3
#10 b     N1    0.553         1
#11 b     N1    0.0233        1
#12 b     N1    0.519         1
#13 b     N2    0.783         2
#14 b     N2    0.789         2
#15 b     N2    0.477         2
#16 b     N2    0.438         2
#17 b     N2    0.407         2
#18 b     N3    0.732         3
#19 b     N3    0.0707        3
#20 b     N3    0.316         3
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is a solution that uses left_join.

df %>%
  left_join(df %>%
  group_by(id, name) %>%
  summarise(group_no = row_number()))
hpesoj626
  • 3,529
  • 1
  • 17
  • 25