2

I try to sample from the following mixture models of gamma distribution:

enter image description here

The R code is as follows:

The algorithm can be translated into a vectorized approach.

Step 1: Generate a random sample k_1,...,k_n of integers in a vector k ,where P(k)=θ_k, k=1,...,5.

Step 2: Set rate=1/k.

n <- 5000
k <- sample(1:5, size=n, replace=TRUE, prob=(1:5)/15)
rate <- 1/k
x <- rgamma(n, shape=3, rate=rate)

My question is why x is now the mixture of these five gamma distributions? In the expression of the mixture model, it seems that we also need coefficient theta_k?

oliver
  • 157
  • 9

1 Answers1

2

Here are two ways to compare samples from a Gamma mixture distribution with the expected mixture density. This should help understand how F_X is the cumulative distribution function of a mixture of Gamma distributions.

# Fix random seed for reproducibility
set.seed(2022)

# Sample data
n <- 100000
X <- unlist(lapply(1:5, function(j) rgamma(n * j, shape = 3, rate = 1 / j)))

# Weighted Gamma mixture density
dmix <- function(x)  sapply(
    x,
    function(val) sum((1:5) / 15 * dgamma(val, shape = 3, rate = 1 / (1:5))))
library(tibble)
data = tibble(x = seq(0, ceiling(max(X)), length.out = 100), y = dmix(x))

# Plot histogram of samples and compare with density 
library(ggplot2)
ggplot(data.frame(x = X), aes(x)) +
    geom_histogram(aes(y = ..density..), bins = 200) + 
    geom_line(data = data, aes(x, y))

enter image description here

Comments:

  1. When sampling we adjust the number of samples from each Gamma distribution based on the weights; in this case the weights are simply 1:5.
  2. We use aes(y = ..density..) to express the histogram as a properly normalised density so that we can compare values with the mixture density dmix.
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks. Can you comment on the code I post? I have question about the code, – oliver Jun 27 '22 at 05:50
  • Ah sorry I misunderstood what you were asking. The important thing to recognise here is that `rgamma` is vectorised. In step 2 of your algo, you are sampling `n` variables from different gamma distributions, with the following parameter values: shape = 3 (for all), and 1/rate was sampled `n` times from 1:5 with probabilities given by the weight theta, i.e. `(1:5)/15`. This gives you a vector of length `n` where rate values are exactly distributed according to the weights. Since `rgamma` is vectorised in `shape` this corresponds to drawing samples from the mixture gamma. – Maurits Evers Jun 27 '22 at 06:16
  • PS. You can confirm that samples drawn from your algo match the expected distribution from my post (you may need to increase your sample size to something like `n <- 100000`). – Maurits Evers Jun 27 '22 at 06:19