4

I have the following dataset

id1<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
status<-c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df<-data.frame(id1,status)

In df for 40% of my observations status is '2'. I am looking for a function to extract a sample of 10 observations from df while maintaining the above proportion.

I have already seen stratified random sampling from data frame in R but it is not talking about the proportions.

Community
  • 1
  • 1
AliCivil
  • 2,003
  • 6
  • 28
  • 43

1 Answers1

5

You can try the stratified function from my "splitstackshape" package:

library(splitstackshape)
stratified(df, "status", 10/nrow(df))
#     id1 status
#  1:   5      1
#  2:  12      1
#  3:   2      1
#  4:   1      1
#  5:   6      1
#  6:   9      1
#  7:  16      2
#  8:  17      2
#  9:  18      2
# 10:  15      2

Alternatively, using sample_frac from "dplyr":

library(dplyr)

df %>%
  group_by(status) %>%
  sample_frac(10/nrow(df))

Both of these would take a stratified sample proportional to the original grouping variable (hence the use of 10/nrow(df), or, equivalently, 0.5).

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Quick question @Ananda Mahto. is this sampling method going to be with replacement? – AliCivil May 22 '15 at 03:43
  • @AliCivil, you can specify this with the replace argument (default=FALSE - see the help of the function). Add `replace=TRUE` if you want to have replacement: `stratified(df, "status", 10/nrow(df), replace=TRUE)`. – Lennert Feb 15 '17 at 13:45