Assuming you want to take out 30 rows for each group, we can do the following. Unfortunately, dplyr's sample_n
cannot handle when the input data frame has less rows than you want to sample (unless you want to sample with replacement).
Where df
is your data.frame:
Solution 1:
library(dplyr)
df %>% group_by(Nationality) %>%
sample_n(30, replace=TRUE) %>%
distinct() %>% # to remove repeated rows where nationalities have less than 30 rows
summarise_at(vars(Age, Overall, Passing), funs(mean))
Solution 2:
df %>% split(.$Nationality) %>%
lapply(function(x) {
if (nrow(x) < 30)
return(x)
x %>% sample_n(30, replace=FALSE)
}) %>%
do.call(what=bind_rows) %>%
group_by(Nationality) %>%
summarise_at(vars(Age, Overall, Passing), funs(mean))
Naturally without guarantee as you did not supply a working example.