Self-promotion alert. I wrote a function that allows convenient stratified sampling, and I've included an option to subset levels from the grouping variables before sampling.
The function is called stratified
and can be used in the following ways:
set.seed(1)
# Proportional sample
stratified(mydf, group="gender", size=.2, select=list(gender = "F"))
# gender age
# 4 F 29
# Fixed-size sampling
stratified(mydf, group="gender", size=2, select=list(gender = "F"))
# gender age
# 4 F 29
# 5 F 31
You can specify multiple groups (for example if your data frame included a "state" variable and you wanted to group by "state" and "gender" you would specify group = c("state", "gender")
). You can also specify multiple "select" arguments (for example, if you wanted only female respondents from California and Texas, and your "state" variable used two-letter state abbreviations, you could specify select = list(gender = "F", state = c("CA", "TX"))
).
The function itself can be found here or you can download and install the package (which gives you convenient access to the help pages and examples) by using install_github
from the "devtools" package as follows:
# install.packages("devtools")
library(devtools)
install_github("mrdwabmisc", "mrdwab")