R repeat function until condition met

Question

I am trying to generate a random sample that excludes certain "bad data." I do not know whether the data is "bad" until after I sample it. Thus, I need to make a random draw from the population and then test it. If the data is "good" then keep it. If the data is "bad" then randomly draw another and test it. I would like to do this until my sample size reaches 25. Below is a simplified example of my attempt to write a function that does this. Can anyone please tell me what I am missing?

df <- data.frame(NAME=c(rep('Frank',10),rep('Mary',10)), SCORE=rnorm(20))
df

random.sample <- function(x) {
  x <- df[sample(nrow(df), 1), ]
  if (x$SCORE > 0) return(x)
 #if (x$SCORE <= 0) run the function again
}

random.sample(df)

I took a look at ?'while' and ?Control but had trouble understanding how to use it. — user1491868, Dec 10 '13 at 23:12
So, you have to do calculation after drawing? here, you already have `SCORE`, just subset those good and sample. — Ananta, Dec 10 '13 at 23:12
@Ananta Would that be still a random sample from the original population? — Alex Popov, Dec 10 '13 at 23:16
@aseidlitz it's using the same info "SCORE" mentioned above unless there is something else not in the example it just reduces to a simple subsetting problem — Stephen Henderson, Dec 10 '13 at 23:19
@user1491868 well then both your data and your example are misleading... — Stephen Henderson, Dec 10 '13 at 23:20
in any case precomputed or not you can just put whatever your test is between the little square brackets then its vectorised and not looping multiple function calls. — Stephen Henderson, Dec 10 '13 at 23:26
Thank you for the helpful comments and I apologize for any confusion. — user1491868, Dec 10 '13 at 23:39

flodel · Accepted Answer · 2013-12-11T03:46:47.187

Here is a general use of a while loop:

random.sample <- function(x) {
  success <- FALSE
  while (!success) {
    # do something
    i <- sample(nrow(df), 1)
    x <- df[sample(nrow(df), 1), ]
    # check for success
    success <- x$SCORE > 0
  }
  return(x)
}

An alternative is to use repeat (syntactic sugar for while(TRUE)) and break:

random.sample <- function(x) {
  repeat {
    # do something
    i <- sample(nrow(df), 1)
    x <- df[sample(nrow(df), 1), ]
    # exit if the condition is met
    if (x$SCORE > 0) break
  }
  return(x)
}

where break makes you exit the repeat block. Alternatively, you could have if (x$SCORE > 0) return(x) to exit the function directly.

score 4 · Answer 2 · answered Dec 10 '13 at 23:16

4

use this after your first sample

while (any(bad <- (x$SCORE <= 0)))
   x[bad, ] <- df[sample(nrow(df), sum(bad)), ]

answered Dec 10 '13 at 23:16

Ricardo Saporta

54,400
17
144
178

IRTFM · Answer 3 · 2013-12-11T04:46:15.620

3

 random.sample <- function(x) {
   x <- df[sample(nrow(df), 1), ]
   if (x$SCORE > 0) return(x)
   Recall(x)# run the function again
 }

 random.sample(df)
#   NAME    SCORE
#14 Mary 1.252566

It seems to me that this should work as well:

 df$SCORE[ df$SCORE > 0 ][ sample(1:sum(df$SCORE > 0), 1) ]
#[1] 0.6579631

edited Dec 11 '13 at 04:46

answered Dec 10 '13 at 23:13

IRTFM

258,963
21
364
487

VERY nice help. The Recall function is not even mentioned anywhere in all of my R manuals. Is it better if I use: if (x$SCORE > 0) { return(x) } else { Recall(x) }? – user1491868 Dec 10 '13 at 23:45
1

elegant but not as efficient as a `while` loop IMHO, as it can create a large call stack. – flodel Dec 11 '13 at 00:31
You are essentially doing rejection sampling. It could been as simple as: `df$SCORE[df$SCORE > 0][ sample(1:(sum(df$SCORE > 0, 1)]`. I'm not sure how to advise on the checkmark. Mine was essentially a throw-away answer. Flodel is right about efficiency. Recursion is not well supported in R. – IRTFM Dec 11 '13 at 04:43
1

Regarding your `df$SCORE[df$SCORE > 0][...]`, it is the same thing I commented to Stephen: OP is giving a "*simplified example*" of a more complex situation where "*I do not know whether the data is "bad" until after I sample it*". So a recursion or a while loop are about the only possible solutions. – flodel Dec 11 '13 at 12:20

score 3 · Answer 4 · answered Dec 10 '13 at 23:14

3

You can just select the rows to sample directly like so (just 5):

> df <- data.frame(NAME=c(rep('Frank',10),rep('Mary',10)), SCORE=rnorm(20))
> df[sample(which(df$SCORE>0), 5),]


 NAME     SCORE
14  Mary 1.0858854
10 Frank 0.7037989
16  Mary 0.7688913
5  Frank 0.2067499
17  Mary 0.4391216

this is without replacement, for bootstrap put in replace=T.

answered Dec 10 '13 at 23:14

Stephen Henderson

6,340
3
27
33

1

I upvoted but since the OP said *I do not know whether the data is "bad" until after I sample it* I am not sure that it will work for him. His example might have been poorly chosen. – flodel Dec 11 '13 at 03:01
@flodel fair enough but R is not a realtime app, nor good at recursive function calls, so if the data needs checked the test is in the data and should be vectorised and put between the brackets.. like this. – Stephen Henderson Dec 11 '13 at 08:06
Whether I keep the observation is a function of the observation itself. I cannot determine whether to keep the observation until after it has been drawn. – user1491868 Dec 11 '13 at 14:06
@user1491868 if the obs is really in a DF then you can do exactly that, subset by your criteria then sample...anyway it's not really that important is it :) – Stephen Henderson Dec 11 '13 at 15:34
After thinking things through I decided to subset my data by the criteria before sampling. But I still think this thread is useful when it is impossible to subset the data before sampling it. Thank you to everybody for their very helpful comments and suggestions. – user1491868 Dec 11 '13 at 16:34

R repeat function until condition met

4 Answers4

Linked