2

I am looking for a package/ code that would generate bivariate Pareto distribution, when two random variables are correlated (and the correlation can be specified by the user). I would be grateful for your help!

Arbiturka
  • 21
  • 5

1 Answers1

4

I couldn't find any existing R packages, but found your question interesting; so I would like to show you a way how to sample from the bivariate type I Pareto distribution using inverse transform sampling.


Theory

The joint pdf of the bivariate Pareto distribution of type I is given by

enter image description here

The goal here is to

  1. draw samples for x2 from the marginal distribution f(x2), and then
  2. draw samples for x1 given x2 from the conditional distribution f(x1|x2).

The marginal and conditional distributions are given by (see e.g. [Mardia, Annals of Mathematical Statistics 33, 1008 (1962)])

enter image description here

We can draw samples using inverse transform sampling, which requires the cumulative distribution functions for both the marginal and conditional distributions. That's easy to calculate, and we get

enter image description here

Then samples for x1 and x2 are given by

enter image description here

where u is a random number from the standard uniform distribution in the interval [0,1].


R implementation

  1. We define two functions to sample values for x1 and x2 from the marginal and conditional distributions using inverse transform sampling as detailed above.

    rpareto_inv <- function(n, theta, a) {
        u <- runif(n, min = 0, max = 1);
        return(theta / (u ^ (1 / a)));
    }
    
    
    rpareto_cond_inv <- function(x2, theta1, theta2, a) {
      u <- runif(length(x2), min = 0, max = 1);
      return(theta1 + theta1 / theta2 * x2 * (1 / (u ^ (1 / (a + 1))) - 1));
    }
    
  2. We chose some values for the sampling and distribution parameters:

    n <- 10^5;     # Number of samples
    theta1 <- 5;   # Location parameter 1
    theta2 <- 2;   # Location parameter 2
    a <- 3;        # Shape parameter
    
  3. Now we can draw samples

    set.seed(2017);
    x2 <- rpareto_inv(n, theta = theta2, a = a);
    x1 <- rpareto_cond_inv(x2, theta1, theta2, a);
    
  4. We can show a 2d density plot and compare some sample summary statistics with their theoretical (population) values.

    require(ggplot2);
    df <- cbind.data.frame(x1 = x1, x2 = x2);
    ggplot(df, aes(x1, x2)) +
        geom_density_2d() +
        xlim(theta1, 1.5 * theta1) +
        ylim(theta2, 1.5 * theta2);
    

    enter image description here

    metrics <- cbind.data.frame(
        obsrv = c(mean(df$x1), mean(df$x2), cor(df$x1, df$x2), cov(df$x1, df$x2)),
        theor = c(a * theta1 / (a - 1), a * theta2 / (a - 1), 1/a, theta1 * theta2 / ((a - 1)^2 * (a - 2))));
    rownames(metrics) <- c("Mean(x1)", "Mean(x2)", "Correlation", "Covariance")
    #                obsrv     theor
    #Mean(x1)    7.4947124 7.5000000
    #Mean(x2)    3.0029318 3.0000000
    #Correlation 0.3429634 0.3333333
    #Covariance  2.3376545 2.5000000
    

    You can see that the agreement is good. Also note that the correlation between x1 and x2 is characterised by the scale parameter a. Consequently, if you wanted to simulate data for a bivariate Pareto distribution with a specific correlation r, you'd just need to set the shape parameter to 1/r. More details on the distribution and additional summary statistics can be found in [Mardia, Annals of Mathematical Statistics 33, 1008 (1962)].

Lastly, you could also use a simple accept-reject sampling method, but I imagine that being a lot slower than the inverse transform sampling approach I'm showing here.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68