I am trying to use a loop function to reduce the length of a data set. I am trying to sample equally from each of the four subgroups within my data frame (all equal length). I am having trouble coming up with code that will be able to sample n-1 rows from each subgroup, where n represents the current length of the subgroup. My current code is as follows:
sub.df<- function(x){
library(data.table)
library(tidyverse)
setDT(x)
while(nrow(x) > 24) {
x.1 <- x %>% # this is the beginning of the sample part
group_by(x$spiral) %>%
tally() %>% select(-n) %>%
sample_n(x, nrow(x)-1, replace = FALSE) #this is where I have trouble
ks <- ks.test(dist(x[,c(1,2)]), unif.null) #this part is for evaluating the exclusions
ks.1 <- ks.test(dist(x.1[,c(1,2)]), unif.null)
if(ks.1$statistic > ks$statistic) {x <- x.1} else {x <- x}
}
}
An example of the data:
x.cord y.cord subgroup
1 1 1
1 4 1
3 5 1
2 1 1
2 -3 2
3 -1 2
3 -2 2
1 -3 2
-2 -2 3
-4 -1 3
-5 -5 3
-2 -1 3
-3 4 4
-1 1 4
-2 5 4
-4 3 4
Now, if the loop ran correctly, the first instance would sample 3 (4-1) from each subgroup, then 2 (3-1), then 1 (2-1). So my final data would be something like:
x.cord y.cord subgroup
3 5 1
1 -3 2
-5 -5 3
-4 3 4
Based on my provided code my actual data set would have 24 points, 6 from each subgroup, but this should hopefully illustrate what I am trying to do.