How can I generate random sample data from the quantiles of the unknown density f(x)
for x
between 0
and 4
in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
How can I generate random sample data from the quantiles of the unknown density f(x)
for x
between 0
and 4
in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x)
. One way to do this is to generate a random sample from a uniform distribution, U[0,1]
, and then transform this sample to your density. This is done using the inverse cdf of f
, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x)
is only defined on [0,Inf]
, we calculate F(x)
as the integral of f(x)
from 0 to x. Then we invert that using the uniroot(...)
function on F-y
. The use of Vectorize(...)
is needed because, unlike almost all R functions, integrate(...)
and uniroot(...)
do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X
drawn from U[0,1]
and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z
is indeed distributed as f(x)
.
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))
Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.
Use sample
. Generate a vector of probablities from your existing function f
, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.