0

Example data:

example_data <-
  data.frame(value = c(1,3,4,6,7,8,4,6,9,0),
             group = c("Not applicable",
                       "Large group",
                       "Large group",
                       "Not applicable",
                       "Group of 1",
                       "Large group",
                       "Large group",
                       "Large group",
                       "Group of 1",
                       "Not applicable"))

I would like to assign group numbers, starting with 1, to groups (both "Large group" and "Group of 1"), and zeroes to "Not applicable" values, using dplyr.

There can be more than one Not applicable value in a row. Group of 1 alway contains one row. Large group may contain any number of rows.

Desired output:

   value          group group_number
1      1 Not applicable            0
2      3    Large group            1
3      4    Large group            1
4      6 Not applicable            0
5      7     Group of 1            2
6      8    Large group            3
7      4    Large group            3
8      6    Large group            3
9      9     Group of 1            4
10     0 Not applicable            0

I tried this solution from the answers to my previous question:

example_data %>%
  mutate(group_number = with(rle(group != "Not applicable"), 
                      rep(cumsum(values) * values, lengths)))

And got

   value          group group_number
1      1 Not applicable            0
2      3    Large group            1
3      4    Large group            1
4      6 Not applicable            0
5      7     Group of 1            2
6      8    Large group            2
7      4    Large group            2
8      6    Large group            2
9      9     Group of 1            2
10     0 Not applicable            0

I would like to get separate numbers for Large group and Group of 1.

Polina B
  • 65
  • 5

1 Answers1

3
library(dplyr)
example_data %>%
  mutate(gr = data.table::rleid(group)* (group != 'Not applicable'),
         gr = dense_rank(gr) - 1) # or even gr = as.numeric(factor(gr)) - 1

       value          group gr
    1      1 Not applicable  0
    2      3    Large group  1
    3      4    Large group  1
    4      6 Not applicable  0
    5      7     Group of 1  2
    6      8    Large group  3
    7      4    Large group  3
    8      6    Large group  3
    9      9     Group of 1  4
    10     0 Not applicable  0
Julien
  • 1,613
  • 1
  • 10
  • 26
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • 1
    A denser syntax `example_data %>% mutate(gr = (data.table::rleid(group)* (group != 'Not applicable')) %>% dense_rank() - 1)` – Julien Jul 26 '22 at 21:48
  • @Julien this creates a different output for gr: 0, 3, 3, 2, 7, 9, 9, 9, 11, 6 – Polina B Jul 26 '22 at 21:52
  • It's the same output : `identical(example_data %>% mutate(gr = (data.table::rleid(group)* (group != 'Not applicable')) %>% dense_rank() - 1) , example_data %>%mutate(gr = data.table::rleid(group)* (group != 'Not applicable'),gr = dense_rank(gr) - 1) ` – Julien Jul 26 '22 at 21:55
  • Thanks, I found where I made an error. – Polina B Jul 26 '22 at 22:12
  • @onyambu I updated my question after trying the solution on the real data. I would appreciate if you could suggest how to change the code to fit the updated task. – Polina B Jul 27 '22 at 03:05
  • 2
    @PolinaB that should be a complete different question. You should ask that and not edit this question. Consider this question closed since the answer given had already solved the issue at hand. Please revert the question back to what it was then ask the problem you have as a new question – Onyambu Jul 27 '22 at 03:11