I have a data frame with two categorical variables.
samples<-c("A","A","A","A","B","B")
groups<-c(1,1,1,2,1,1)
df<- data.frame(samples,groups)
df
samples groups
1 A 1
2 A 1
3 A 1
4 A 2
5 B 1
6 B 1
The result that I would like to have is for each given observation (sample-group) to downsample (randomly, this is important) the data frame to a maximum of X rows and keep all obervation for which appear less than X times. In the example here X=2. Is there an easy way to do this? The issue that I have is that observation 4 (A,2) appears only once, thus dplyr sample_n would not work.
desired output
samples groups
1 A 1
2 A 1
3 A 2
4 B 1
5 B 1