7

I need to generate random values for two beta-distributed variables that are correlated using SAS. The two variables of interest are characterized as follows:


X1 has mean = 0.896 and variance = 0.001.

X2 has mean = 0.206 and variance = 0.004.

For X1 and X2, p = 0.5, where p is the correlation coefficient.


Using SAS, I understand how to generate a random number specifying a beta distribution using the function X = RAND('BETA', a, b), where a and b are the two shape parameters for a variable X that can be calculated from the mean and variance. However, I want to generate values for both X1 and X2 simultaneously while specifying that they are correlated at p = 0.5.

  • 1
    I don't have an answer as this is outside my understanding, but I would recommend looking up Rick Wicklin's book, [Simulating Data With SAS](http://www.sas.com/store/books/categories/usage-and-reference/simulating-data-with-sas-/prodBK_65378_en.html). It may well cover how to do this. He also has some articles on [the Do Loop](http://blogs.sas.com/content/iml/) which contain some of the same information, so it may be worth looking there as well. – Joe Jul 16 '15 at 16:35
  • @Joe thanks for pointing me to Rick Wicklin's book^. I came to a solution based on modified methods from Chapter 9 - Advanced Simulation of Multivariate Data [9.5: Generating Data From Copulas]. I will post the answer tomorrow. – Gavin M. Jones Jul 17 '15 at 01:07
  • There are, of course, infinitely many joint distributions that have the properties that you specify. Copulas are one way to choose a particular simple dependence structure, and you have already found the book and chapter that I recommend. Be aware that not all correlations are possible. I don't know whether rho=0.5 is permitted or not. For an example, see http://blogs.sas.com/content/iml/2012/09/12/when-is-a-correlation-matrix-not-a-correlation-matrix.html – Rick Jul 17 '15 at 10:56
  • You may also want to check Steen Magnussen's 2002 paper: http://www.sciencedirect.com/science/article/pii/S0167947303001695 – MichaelChirico Nov 04 '16 at 20:19

1 Answers1

6

This solution is based on modified methods used from Chapter 9 of Simulating Data with SAS by Rick Wicklin.

In this particular example, I first have to define variable means, variances, and shape-parameters (alpha, beta) that are associated with the beta distribution:

data beta_corr_vars;
    input x1 var1 x2 var2;  *mean1, variance1, mean2, variance2;
    *calculate shape parameters alpha and beta from means and variances;
    alpha1 = ((1 - x1) / var1 - 1/ x1) * x1**2;   
    alpha2 = ((1 - x2) / var2 - 1/ x2) * x2**2; 
    beta1 = alpha1 * (1 / x1 - 1);
    beta2 = alpha2 * (1 / x2 - 1);
    *here are the means and variances referred to in the original question;
    datalines; 
0.896 0.001 0.206 0.004
;
run;
proc print data = beta_corr_vars;
run;

Once these variables are defined:

proc iml;
  use beta_corr_vars; read all; 
  call randseed(12345);
      N = 10000;                  *number of random variable sets to generate;
      *simulate bivariate normal data with a specified correlation (here, rho = 0.5);
      Z = RandNormal(N, {0, 0}, {1 0.5, 0.5 1});   *RandNormal(N, Mean, Cov);
      *transform the normal variates into uniform variates;
      U = cdf("Normal", Z);      

      *From here, we can obtain beta variates for each column of U by; 
      *applying the inverse beta CDF;
      x1_beta = quantile("Beta", U[,1], alpha1, beta1);        
      x2_beta = quantile("Beta", U[,2], alpha2, beta2); 
      X = x1_beta || x2_beta; 

  *check adequacy of rho values--they approach the desired values with more sims (N);
  rhoZ = corr(Z)[1,2];                
  rhoX = corr(X)[1,2];

print X;
print rhoZ rhoX;

Thank you to all users who contributed to this answer.

Community
  • 1
  • 1