7

I am trying to create a unique, randomly assigned (without replacement) group id without using a for loop. This is as far as I got:

library(datasets)
library(dplyr)

data(iris)

iris <- iris  %>% group_by(Species) %>% mutate(id = cur_group_id())

This gives me a group id for each iris$Species, however, I would like the group id to randomly assigned from c(1,2,3) as opposed to assigned based on the order of the dataset.

Any help creating this would be very helpful! I am sure there is a way to do this with dplyr but I am stumped...

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81

4 Answers4

5

Maybe you can play some tricks on group_by by adding sample operation, e.g.,

iris <- iris %>%
  group_by(factor(Species, levels = sample(levels(Species)))) %>%
  mutate(id = cur_group_id())
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

Here's a sample answer creating a random number and ranking them.

library(datasets)
library(dplyr)

data(iris)

df <- iris %>% 
  group_by(Species) %>%
  mutate(id = runif(1,0,1)) %>% 
  ungroup() %>% 
  mutate(id = dense_rank(id))

df %>% sample_n(10)
#> # A tibble: 10 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species       id
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>      <int>
#>  1          4.4         3            1.3         0.2 setosa         3
#>  2          6.5         3            5.5         1.8 virginica      2
#>  3          6.3         2.7          4.9         1.8 virginica      2
#>  4          5           3.6          1.4         0.2 setosa         3
#>  5          6.3         2.3          4.4         1.3 versicolor     1
#>  6          7.9         3.8          6.4         2   virginica      2
#>  7          5.4         3.9          1.7         0.4 setosa         3
#>  8          5.7         4.4          1.5         0.4 setosa         3
#>  9          6.4         2.8          5.6         2.2 virginica      2
#> 10          5.2         3.4          1.4         0.2 setosa         3

Created on 2020-07-29 by the reprex package (v0.3.0)

Ryan John
  • 1,410
  • 1
  • 15
  • 23
1

Here's an approach with sample and recode:

  1. Use seq_along(unique(id)) to create a vector of integer values to recode to.
  2. Use sample to sample the appropriate number of random values.
  3. Use setNames to name the ids with their new random values.
  4. Use !!! to force that vector of named id into a list of expressions.
  5. use recode to change the values.
iris  %>%
  group_by(Species) %>%
  mutate(id = cur_group_id()) %>%
  mutate(id = recode(id, !!!setNames(unique(id),
                                     sample(seq_along(unique(id))))))

I think the other answers are better approachs, but having recode with !!! in your toolkit is helpful in other situations.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
1

Randomise the rows and then assign id based on the occurrence of Species :

library(dplyr)

iris %>%
  slice_sample(n = nrow(.)) %>%
  #sample_n for dplyr < 1.0.0
  #sample_n(n()) %>%
  mutate(id = match(Species, unique(Species)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213