Create unique random group id in R

Question

I am trying to create a unique, randomly assigned (without replacement) group id without using a for loop. This is as far as I got:

library(datasets)
library(dplyr)

data(iris)

iris <- iris  %>% group_by(Species) %>% mutate(id = cur_group_id())

This gives me a group id for each iris$Species, however, I would like the group id to randomly assigned from c(1,2,3) as opposed to assigned based on the order of the dataset.

Any help creating this would be very helpful! I am sure there is a way to do this with dplyr but I am stumped...

Must they be 50,each? Or could you have different group sizes? — Onyambu, Jul 29 '20 at 22:35

score 5 · Accepted Answer · answered Jul 29 '20 at 22:44

5

Maybe you can play some tricks on group_by by adding sample operation, e.g.,

iris <- iris %>%
  group_by(factor(Species, levels = sample(levels(Species)))) %>%
  mutate(id = cur_group_id())

answered Jul 29 '20 at 22:44

ThomasIsCoding

96,636
9
24
81

score 2 · Answer 2 · answered Jul 29 '20 at 22:40

Here's a sample answer creating a random number and ranking them.

library(datasets)
library(dplyr)

data(iris)

df <- iris %>% 
  group_by(Species) %>%
  mutate(id = runif(1,0,1)) %>% 
  ungroup() %>% 
  mutate(id = dense_rank(id))

df %>% sample_n(10)
#> # A tibble: 10 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species       id
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>      <int>
#>  1          4.4         3            1.3         0.2 setosa         3
#>  2          6.5         3            5.5         1.8 virginica      2
#>  3          6.3         2.7          4.9         1.8 virginica      2
#>  4          5           3.6          1.4         0.2 setosa         3
#>  5          6.3         2.3          4.4         1.3 versicolor     1
#>  6          7.9         3.8          6.4         2   virginica      2
#>  7          5.4         3.9          1.7         0.4 setosa         3
#>  8          5.7         4.4          1.5         0.4 setosa         3
#>  9          6.4         2.8          5.6         2.2 virginica      2
#> 10          5.2         3.4          1.4         0.2 setosa         3

^{Created on 2020-07-29 by the reprex package (v0.3.0)}

That is a great trick. I had never seen dense_rank before. – itsMeInMiami Jul 29 '20 at 22:47 — itsMeInMiami, Jul 29 '20 at 22:47

Ian Campbell · Answer 3 · 2020-07-29T22:53:23.117

Here's an approach with sample and recode:

Use seq_along(unique(id)) to create a vector of integer values to recode to.
Use sample to sample the appropriate number of random values.
Use setNames to name the ids with their new random values.
Use !!! to force that vector of named id into a list of expressions.
use recode to change the values.

iris  %>%
  group_by(Species) %>%
  mutate(id = cur_group_id()) %>%
  mutate(id = recode(id, !!!setNames(unique(id),
                                     sample(seq_along(unique(id))))))

I think the other answers are better approachs, but having recode with !!! in your toolkit is helpful in other situations.

score 1 · Answer 4 · answered Jul 30 '20 at 00:07

1

Randomise the rows and then assign id based on the occurrence of Species :

library(dplyr)

iris %>%
  slice_sample(n = nrow(.)) %>%
  #sample_n for dplyr < 1.0.0
  #sample_n(n()) %>%
  mutate(id = match(Species, unique(Species)))

answered Jul 30 '20 at 00:07

Ronak Shah

377,200
20
156
213

Create unique random group id in R

4 Answers4

Linked

Related