Random sampling until sum is reached

Question

I am relatively new to R and I am looking to randomly sample from a dataframe containing a column with area values. How would I go about achieving this where I sample rows until the sum of the areas reach a certain value(or close to it)? I've tried using the code shown below from a previous question that's similar to mine, but the sum of the samples are not always within the range set in the code.

sample <- function(df) {
  s1<- df[sample(rownames(df),1),]
  s11 <- sum(s1$Area)
  while (s11<43900000) {
    rn2<- rownames(df[!(rownames(df) %in% rownames(s1)),])
    nr<- df[sample(rn2,1),]
    s11 <- sum(rbind(s1,nr)$Area)
    if(s11>43800000){
      break()
    }
    s1<-rbind(s1,nr)
}
return(s1)
}

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Jul 14 '20 at 22:14
Also, 1. you shouldn't need the if after the while, as the while loop only exceeds while your condition is true - you don't need to break out of it. 2. your break with the if occurs after the values are summed. Those summed values that exceed the while loop value are still retained. — Michelle, Jul 14 '20 at 22:17

score 0 · Answer 1 · answered Jul 14 '20 at 22:54

Let's create a small example. Assuming you have a data.frame with a column named Area and you want to sample those data until the sum is close to 100:

set.seed(123)
df <- data.frame(Area = runif(500))

my_samp <- sample(1:nrow(df))  # sample the rows of df

my_samp[which(cumsum(df[my_samp, 1]) < 100)]  # return the sampled rows with cumulative sum < 100

So

sum(df[my_samp[which(cumsum(df[my_samp, 1]) < 100)],1])

returns

> sum(df[my_samp[which(cumsum(df[my_samp, 1]) < 100)],1])
[1] 99.89822

Random sampling until sum is reached

1 Answers1

Linked