That's a good question! Can't see any easy way to do it with the documented syntax for dplyr
but how about this for a workaround?
sampleGroup<-function(df,x=1){
df[
unlist(lapply(attr((df),"indices"),function(r)sample(r,min(length(r),x))))
,]
}
sampleGroup(iris %.% group_by(Species),3)
#Source: local data frame [9 x 5]
#Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#39 4.4 3.0 1.3 0.2 setosa
#16 5.7 4.4 1.5 0.4 setosa
#25 4.8 3.4 1.9 0.2 setosa
#51 7.0 3.2 4.7 1.4 versicolor
#62 5.9 3.0 4.2 1.5 versicolor
#59 6.6 2.9 4.6 1.3 versicolor
#148 6.5 3.0 5.2 2.0 virginica
#103 7.1 3.0 5.9 2.1 virginica
#120 6.0 2.2 5.0 1.5 virginica
EDIT - PERFORMANCE COMPARISON
Here's a test against using data.table (both native and with a function call as per the example) for 1m rows, 26 groups.
Native data.table is about 2x as fast as the dplyr workaround and also than data.table call with callout. So probably dplyr / data.table are about the same performance.
Hopefully the dplyr guys will give us some native syntax for sampling soon! (or even better, maybe it's already there)
sampleGroup.dt<-function(df,size) {
df[sample(nrow(df),size=size),]
}
testdata<-data.frame(group=sample(letters,10e5,T),runif(10e5))
dti<-data.table(testdata)
# using the dplyr workaround with external function call
system.time(sampleGroup(testdata %.% group_by(group),10))
#user system elapsed
#0.07 0.00 0.06
#using native data.table
system.time(dti[dti[,list(val=sample(.I,10)),by="group"]$val])
#user system elapsed
#0.04 0.00 0.03
#using data.table with external function call
system.time(dti[, sampleGroup.dt(dti, 10), by=group])
#user system elapsed
#0.06 0.02 0.08