I have a dataframe called test.data
where I have a column called Ethnicity
. There are three groups of ethnicities (more in actual data), Adygei, Balochi and Biaka_pygmies. I want to subset this data frame to include only two samples (rows) randomly from each ethnic group and get the result
. How can I do this in R?
test.data <- structure(list(Sample = c("1793102418_A", "1793102460_A", "1793102500_A",
"1793102576_A", "1749751113_A", "1749751187_A", "1749751189_A",
"1749751285_A", "1749751356_A", "1749751195_A", "1749751218_A",
"1775705355_A"), Ethnicity = c("Adygei", "Adygei", "Adygei",
"Adygei", "Balochi", "Balochi", "Balochi", "Balochi", "Balochi",
"Biaka_Pygmies", "Biaka_Pygmies", "Biaka_Pygmies"), Height = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Sample", "Ethnicity",
"Height"), row.names = c("1793102418_A", "1793102460_A", "1793102500_A",
"1793102576_A", "1749751113_A", "1749751187_A", "1749751189_A",
"1749751285_A", "1749751356_A", "1749751195_A", "1749751218_A",
"1775705355_A"), class = "data.frame")
result
Sample Ethnicity Height
1793102418_A 1793102418_A Adygei 0
1793102460_A 1793102460_A Adygei 0
1749751189_A 1749751189_A Balochi 0
1749751285_A 1749751285_A Balochi 0
1749751195_A 1749751195_A Biaka_Pygmies 0
1775705355_A 1775705355_A Biaka_Pygmies 0