4

I have the following dataset

varA <- c(rep("A",2), rep("B",4))
varB <- c(rep("aaaa",2), rep("bbbb", 3), rep("cccc",1) )

dat <- data.frame(varA, varB)
dat 
  varA varB
1    A aaaa
2    A aaaa
3    B bbbb
4    B bbbb
5    B bbbb
6    B cccc

I would like to generate ids for each subgroup, such that the first subgroup is 1, the second 2, etc, within varA. Theids can repeat across the dataset, just not within subgroup.

This the needed result

  varA varB res
1    A aaaa   1
2    A aaaa   1
3    B bbbb   1
4    B bbbb   1
5    B bbbb   1
6    B cccc   2 

How can I do this with R ?

I tried cur_group_id() in dplyr but it is not working for me...

thanks!!

Sotos
  • 51,121
  • 6
  • 32
  • 66
ZMacarozzi
  • 697
  • 1
  • 5
  • 11
  • 2
    Where is VarA?? Also why is B bbbb in group 1 and B cccc in group 2? What are the rules? – Sotos Dec 16 '21 at 07:57
  • 1
    Does this answer your question? [R - Group by variable and then assign a unique ID](https://stackoverflow.com/questions/39650511/r-group-by-variable-and-then-assign-a-unique-id) – Taren Sanders Dec 16 '21 at 07:59
  • In your case, since it's two variables, it would be `dat %>% mutate(id = group_indices(., varA, varB))` – Taren Sanders Dec 16 '21 at 08:00
  • @SotosThanks for the comments. The rules are that the ids must be generated WITHIN varA. That is there is only one sub-group in varA. so the id is 1. In varB there are 2 subgroups, they need to be given ids 1 and 2. – ZMacarozzi Dec 16 '21 at 08:06
  • @Taren Sanders, no unfortunately the question is slightly different. Maybe my comment above will make it more clear what I am after. Itried your code line but I get an error message "roblem with `mutate()` input `id`. i The `...` argument of `group_keys()` is deprecated as of dplyr 1.0.0. Please `group_by()` first". Any advice – ZMacarozzi Dec 16 '21 at 08:06
  • 1
    Ah nevermind, I missed that the ids were repeating within varA. Sotos has a nice answer. – Taren Sanders Dec 16 '21 at 08:23

2 Answers2

4

You can use data.table::rleid(), i.e.

library(dplyr)

df %>% 
 group_by(VarA) %>% 
 mutate(id = data.table::rleid(VarB))

# A tibble: 6 x 3
# Groups:   VarA [2]
#  VarA  VarB     id
#  <chr> <chr> <int>
#1 A     aaaa      1
#2 A     aaaa      1
#3 B     bbbb      1
#4 B     bbbb      1
#5 B     bbbb      1
#6 B     cccc      2
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

Another potential solution:

library(tidyverse)
varA <- c(rep("A",2), rep("B",4))
varB <- c(rep("aaaa",2), rep("bbbb", 3), rep("cccc",1) )

dat <- data.frame(varA, varB)

dat %>%
  group_by(varA) %>%
  mutate(count = ifelse(varB != lag(varB, default = "NA"),
                       1, 0)) %>%
  mutate(rleid = cumsum(count))
#> # A tibble: 6 × 4
#> # Groups:   varA [2]
#>   varA  varB  count rleid
#>   <chr> <chr> <dbl> <dbl>
#> 1 A     aaaa      1     1
#> 2 A     aaaa      0     1
#> 3 B     bbbb      1     1
#> 4 B     bbbb      0     1
#> 5 B     bbbb      0     1
#> 6 B     cccc      1     2

Created on 2021-12-16 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46