0

I have 4 kinds of instructions, say I1, I2, I3, I4, and I have to operate these instructions in a program, according to the following probabilities :
- with 1% chance, I use I1
- with 85% chance, I use I2
- with 7% chance, I use I3
- with 7% chance, I use I4
(note : 1% + 85% + 7% + 7% = 100%)

Each instruction creates a 2-coordinates point (x, y) which I stack in a data frame df.

I know I can do this using the following code:

df <- c(0,0)
for (i in 1:n)
  {
x <- sample(1:100, 1)
if (x==1) { I1 }
if (x>=2 & x<=86) { I2 }
if (x>=87 & x<=93) { I3 }
if (x>=94 & x<=100) { I4 }
df <- rbind(df, c(x, y))
  }

But this is horribly time-consuming as far as "n" is not small. I would like to vectorize the code, in order to accelerate the processing, but I cannot find how.
Any idea ?

FOR COMPLETE UNDERSTANDING : the program is aimed to reproduce the so-called Barnsley fern.

A1 <- matrix(c(0,0,0,0.16), byrow=FALSE, ncol=2)
V1 <- matrix(c(0,0), byrow=FALSE, ncol=1)
A2 <- matrix(c(0.85,-0.04,0.04,0.85), byrow=FALSE, ncol=2)
V2 <- matrix(c(0,1.6), byrow=FALSE, ncol=1)
A3 <- matrix(c(0.2,0.23,-0.26,0.22), byrow=FALSE, ncol=2)
V3 <- matrix(c(0,1.6), byrow=FALSE, ncol=1)
A4 <- matrix(c(-0.15,0.26,0.28,0.24), byrow=FALSE, ncol=2)
V4 <- matrix(c(0,0.44), byrow=FALSE, ncol=1)

z1 <- matrix(c(0,0), byrow=FALSE, ncol=1)

df <- as.data.frame(t(z1))

for (i in 1:1000)
  {
x <- sample(1:100, 1)

if (x==1) { z2 <- A1%*%z1 + V1}
if (x>=2 & x<=86) { z2 <- A2%*%z1 + V2}
if (x>=87 & x<=93) { z2 <- A3%*%z1 + V3}
if (x>=94 & x<=100) { z2 <- A4%*%z1 + V4}

df <- rbind(df, t(z2))
z1 <- z2
  }

plot(df$V1, df$V2, xlab="", ylab="")
Andrew
  • 926
  • 2
  • 17
  • 24
  • Look at findInterval. There are many worked examples on SO. – IRTFM Mar 10 '17 at 16:39
  • 1
    The slow part of your code is probably the `rbind()` part, not the `if` statements. Without knowing exactly what these "instructions" are it's hard to say if there's a better way to vectorize. It's easier to help if you provide some sort of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can actually run the code ourselves. – MrFlick Mar 10 '17 at 16:39

1 Answers1

0

The sampling function is capatible of being vectorized. Here is a simple example that could apply to your needs.

 n<- 100
 sampling<- sample( c(I1(), I2(), I3(), I4()), n, replace=TRUE, prob=c(1,85,7,7))     

 df <- data.frame( x=1:n, y=sampling)


 #run these lines first
 #define your functions
 I1 <- function() { return(10)}
 I2 <- function() { return(20)}
 I3 <- function() { return(30)}
 I4 <- function() { return(40)}

This should be all you need. Just remember to define your functions before attempting to evaluate the sample function.

Note: after reading you edit, the instructions are not constant but will vary based on the prior value, thus the above approach will not work.

Dave2e
  • 22,192
  • 18
  • 42
  • 50