1

I have a large sample data of healthcare data called oct

Providers  ID date ICD
Billy  4504 9/11 f.11
Billy  5090 9/10 r.05
Max   4430  9/01 k.11
Mindy 0812 9/30  f.11 
etc. 

I want a random sample of ID numbers for each provider. I have tried.

review <- oct %>% group_by(Providers) %>% do (sample(oct$ID, size = 5, replace= FALSE, prob = NULL))
Taher A. Ghaleb
  • 5,120
  • 5
  • 31
  • 44
  • `do()` will return an error if it does not return a data frame and you don't name the output. I can't be sure this will work for you unless you give an example of your data in a format I can copy and paste into R, but try this: `review <- oct %>% group_by(Providers) %>% do (ID_sample = sample(ID, size = 5, replace= FALSE, prob = NULL))` – qdread Dec 03 '18 at 16:50
  • 1
    check out `dplyr`'s `sample_n` - you can use that with groups – zack Dec 03 '18 at 16:50
  • 1
    Does this answer your question? [Take randomly sample based on groups](https://stackoverflow.com/questions/18258690/take-randomly-sample-based-on-groups) – camille Dec 24 '21 at 15:47

1 Answers1

4

Example using dplyr::sample_n

library(dplyr)
set.seed(1)
mtcars %>% group_by(cyl) %>% sample_n(3)

# A tibble: 9 x 11
# Groups:   cyl [3]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
2  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
3  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
4  19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
5  21       6 160     110  3.9   2.88  17.0     0     1     4     4
6  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
7  15       8 301     335  3.54  3.57  14.6     0     1     5     8
8  15.5     8 318     150  2.76  3.52  16.9     0     0     3     2
9  14.7     8 440     230  3.23  5.34  17.4     0     0     3     4

If you'd like to just select a specific variable (ID in your question):

set.seed(1)

mtcars %>% 
  group_by(cyl) %>% 
  sample_n(3) %>%
  pull(mpg)

[1] 22.8 32.4 33.9 19.7 21.0 19.2 15.0 15.5 14.7
zack
  • 5,205
  • 1
  • 19
  • 25