Need help understanding this probability code

Question

New to R... this is the famous (maybe?) probability simulation of a bus that starts empty and picks up 0, 1, or 2 passengers at each stop with probability of 0.5, 0.4, and 0.2 respectively. Also at every stop, the probability of each passenger getting off is 0.2

I can see that in the beginning it skips a portion of the code as the variable 'passengers' start off at 0. But what I don't understand is the runif(1) code - what's it doing here? Also another question - how do I track the vectors, I'd like to see how it stores information like in a table (new to R so haven't a clue).

(This code simulates the probability of an empty bus after 10 stops)

nreps <- 10000
nstops <- 10
count <- 0
for (i in 1:nreps){
  passengers <- 0
  for (j in 1:nstops){
    if(passengers > 0)
      for (k in 1:passengers)
        if(runif(1) < 0.2)
          passengers <- passengers - 1
    newpass <- sample(0:2, 1, prob=c(0.5, 0.4, 0.1))
    passengers <- passengers + newpass
  }
  if (passengers == 0) count <- count + 1
}
print(count/nreps)

The function call runif(1) draws a random value from a uniform distribution on the unit interval [0,1]. This probability of this random value being smaller than 0.2 is exactly 0.2. As such the probability of one passenger leaving the bus is 0.2 at every stop. — 67342343, Sep 02 '17 at 22:12
oh, that's makes sense. Do you have a tip about my second question? How might I access the tables where all the vector data is? I'm using R-Studio. — RawlinsCross, Sep 02 '17 at 22:18
I am not sure what you are referring too. There aren't really any tables in this code, only variables which on first sight are all integers. — 67342343, Sep 02 '17 at 22:21
So when "nsteps" is 10, or 100, or 1000, should it not generate 10, 100 or 1000 lines with all the values of variables? I'd like to view those to see how it's working? (if that make any sense?) — RawlinsCross, Sep 02 '17 at 22:23
In that case you could add some print statements (refer to documentation of e.g. `sprintf`) and potentially halt execution of your script after each iteration of the for loop (refer to https://stackoverflow.com/questions/15272916/how-to-wait-for-a-keypress-in-r to see how this can be done). — 67342343, Sep 02 '17 at 22:31
It's way better to pull all the values at once rather than to pull one at a time. You will get the same values, and it will be much faster. See the timings of my answer [on this post](https://stackoverflow.com/questions/21991130/simulating-a-random-walk/45900693#45900693) to get a better idea. — lmo, Sep 03 '17 at 00:47

score 1 · Answer 1 · answered Sep 02 '17 at 22:48

First, here's an explanation of each step:

nreps <- 10000
nstops <- 10
count <- 0

for (i in 1:nreps){  #for 10,000 iterations
  passengers <- 0  #start with zero passengers
  for (j in 1:nstops){  #for the number of stops (10) do...
    if(passengers > 0)  #for number of passengers, when number > 0...
      for (k in 1:passengers)  #for each passenger...
        if(runif(1) < 0.2)  #take 1 random val from a uniform dist, and if less than 0.2...
          passengers <- passengers - 1  #subtract 1 from passengers (i.e. n-1)
        newpass <- sample(0:2, 1, prob=c(0.5, 0.4, 0.1))  #take one random sample from 0,1,2, with probability weights 0.5, 0.4, 0.1, respectively
        passengers <- passengers + newpass  #add existing passengers to number of new passengers
  }
  if (passengers == 0) count <- count + 1  #if the number of passengers still equal zero by now, add 1 to the count
}

print(count/nreps)  #number of times when passenger = 0 divided by number of iterations

Now, if you want to track the progress of each iteration, you can first set up empty variables, then fill those variables according to the index of each iteration, like this:

passengers_before <- c()  #empty vector
passengers_after <- c()  #empty vector

for (i in 1:nreps){   #for 10,000 iterations
  passengers <- 0   #start with zero passengers
  passengers_before[i] <- passengers
  for (j in 1:nstops){   #for the number of stops (10) do...
    if(passengers > 0)   #for number of passengers, when number > 0...
      for (k in 1:passengers)   #for each passenger...
        if(runif(1) < 0.2)   #take 1 random val from a uniform dist, and if less than 0.2...
          passengers <- passengers - 1   #subtract 1 from passengers (i.e. n-1)
        newpass <- sample(0:2, 1, prob=c(0.5, 0.4, 0.1))   #take one random sample from 0,1,2, with probability weights 0.5, 0.4, 0.1, respectively
        passengers <- passengers + newpass   #add existing passengers to number of new passengers
  }
  passengers_after[i] <- passengers
  if (passengers == 0) count <- count + 1   #if the number of passengers still equal zero by now, add 1 to the count
}

passenger_sample <- data.frame(passengers_before,passengers_after)

print(count/nreps)

Now, passenger_sample will be a data.frame with the number of rows equal to the number of iterations (e.g. 10,000), and it will have two columns: passengers_before and passengers_after.

Here's the first first handful of rows:

head(passenger_sample)

  passengers_before passengers_after
1                 0                2
2                 0                4
3                 0                4
4                 0                4
5                 0                1
6                 0                2

Need help understanding this probability code

1 Answers1