Hi Stackoverflow users,
I am new at R, and have been learning for a couple of weeks only. I have a data frame with 15 string variables on people's characteristics (e.g. ethnicity, education, country of origin); one row is one person.
How can I tell R to create a subset of the original data frame such that this new data frame includes N random people (who have been drawn with replacement), 50% of N has Ethnicity ET, and 50% of N has Education ED? I know the basic A) and B)
A) I know how to draw N observations at random with replacement, as suggested here and here. For example:
df[sample(nrow(df), size=N, replace=TRUE), ]
B) In this other post, there are examples on how to condition the random draw (without replacement).
df[ sample( which( df$Ethnicity== "ET" | df$Education= "ED" ) , N ) , ]
However, I would like to know how to make more complex conditional draws, that is, 50% of N has to have Ethnicity ET, and 50% of N has to have Education ED. Thus, in this new sample of size N, the two conditions only partially intersect: for some people Ethnicity==ET & Education==ED, for some people Ethnicity!=ET & Education==ED, for some people Ethnicity==ET & Education!=ED, for some people Ethnicity!=ET & Education!=ED.