2

I try to simulate highly correlated (average absolute correlation of ~0.7) variables for 25 variables using 4,000 observations.

However, I can't get Sigma positive definite, and thus I need to use a trick to get sigma PD. And hereby I lose most of the initial correlation, I go from ~0.7 to ~0.25.

The reason might be that I need positive ánd negative correlated variables. I am able to simulate highly correlated variables for 25 variables if all correlations are only positive, however I need 50/50 positive/negative on average.

I make a distinction in the code for informative (relevant) & uninformative (noisy) variables.

Is there a solution to this issue?

Thank you for your time.

library("matrixcalc")
library("MASS")
library("Matrix")

p_inf <- 4
p_unf <- 21
N <- 4000

# Number of unique correlations in correlation matrix
n_inf_sample = (p_inf^2-p_inf)/2
n_unf_unf_and_inf_unf = p_inf*p_unf+(p_unf^2-p_unf)/2

Corr_inf_sample <- sample(runif(n_inf_sample,-0.3,0.3))

Corr_unf_sample <- sample(c(runif(n_unf_unf_and_inf_unf, -0.99,0.99),
                            runif(n_unf_unf_and_inf_unf,0.7,0.9),
                            runif(n_unf_unf_and_inf_unf,-0.9,-0.7)
                            ), n_unf_unf_and_inf_unf)

Corr_matrix <- matrix(ncol = p_inf+p_unf, nrow = p_inf+p_unf,0)
Corr_matrix[upper.tri(Corr_matrix, diag= FALSE)] <- c(Corr_inf_sample,Corr_unf_sample)
Corr_matrix[lower.tri(Corr_matrix, diag= FALSE)] <- t(Corr_matrix)[lower.tri(t(Corr_matrix))]
diag(Corr_matrix) <- 1

#  Mu and Sigma 
mu<- c(runif(p_inf+p_unf, -3,3)) 
sigma<- Corr_matrix

if (is.positive.definite(as.matrix(sigma))) {
  df <- as.data.frame(mvrnorm(n=N, mu=mu, Sigma=sigma))
  print("Sigma is Positive Definite")
}
if (!is.positive.definite(as.matrix(sigma))) {
  sigma2 <- nearPD(sigma)
  df<-as.data.frame(mvrnorm(n=N, mu=mu, Sigma=sigma2$mat)) 
  print("Sigma is not Positive Definite, we create PD matrix")
}

print(c("Mean absolute accorelation sigma: ", round(mean(abs(sigma)),3)))
print(c("Mean absolute correlation variables: " , round(mean(abs(cor(df))),3)))
  • 1
    This seems like a cross Validated or Stack.Exchange question. ie how to get a PD matrix with high correlation of 0.7 I believe once the Sigma is obtained the rest is straight forward – Onyambu Apr 21 '22 at 20:36
  • The first thought that comes to mind is that you could just do your simulation with only positive correlations and afterwards flip some of the simulated variables so they have negative correlations to the others. Would that suffice the purpose of your simulation? – benimwolfspelz Apr 22 '22 at 08:02
  • @benimwolfspelz Creative idea. Although I think it would work, I need this simulation for my thesis, and thus I think that I need to use `academic approved' simulation methods, so I rather not manually change data myself to obtain higher correlations :) – Constantijn de Jonge Apr 22 '22 at 21:37

0 Answers0