1

Is it possible to set a correlation = 1 using the cholesky decomposition technique?

set.seed(88)
mu<- 0
sigma<-1
x<-rnorm(10000, mu, sigma)
y<-rnorm(10000, mu, sigma)
MAT<-cbind(x,y)
cor(MAT[,1],MAT[,2])

#this doesn't work because 1 makes it NOT positive-definite. any number 0 to .99 works
correlationMAT<- matrix(1,nrow = 2,ncol = 2)

U<-chol(correlationMAT)
newMAT<- MAT %*% U 
cor(newMAT[,1], newMAT[,2])  #.....but I want to make this cor = 1

Any ideas?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
user3022875
  • 8,598
  • 26
  • 103
  • 167
  • 1
    If you want to create variables with `cor` 1, that's pretty easy... `x <- rnorm(10000)`, `y <- constant * x` where `constant` is anything you want that's greater than 0. But if there's noise, then the correlation will be less than 1. – Gregor Thomas Oct 14 '14 at 16:24
  • @Gregor - But I want to transform to using the chol or something similar. – user3022875 Oct 14 '14 at 16:38
  • @user3022875 Why would it matter how you did it? 1 is a pathological case, but it's a trivial one. – David Robinson Oct 14 '14 at 16:40
  • If you want to use matrix multiplication you can do `MAT %*% M` where `M <- matrix(c(1, 0, constant, 0), 2)`, and `constant` is > 0. But, once again, it won't matter what the second column of `MAT` is... if your answer depends on that second column then the correlation won't be 1. – Gregor Thomas Oct 14 '14 at 16:45

1 Answers1

0

Actually you can, by using pivoted Cholesky factorization.

correlationMAT<- matrix(1,nrow = 2,ncol = 2)
U <- chol(correlationMAT, pivot = TRUE)
#Warning message:
#In chol.default(correlationMAT, pivot = TRUE) :
#  the matrix is either rank-deficient or indefinite

U
#     [,1] [,2]
#[1,]    1    1
#[2,]    0    0
#attr(,"pivot")
#[1] 1 2
#attr(,"rank")
#[1] 1

Note, U has identical columns. If we do MAT %*% U, we replicate MAT[, 1] twice, which means the second random variable will be identical to the first one.

newMAT<- MAT %*% U

cor(newMAT)
#     [,1] [,2]
#[1,]    1    1
#[2,]    1    1

You don't need to worry that two random variables are identical. Remember, this only means they are identical after standardization (to N(0, 1)). You can rescale them by different standard deviation, then shift them by different mean to make them different.


Pivoted Cholesky factorization is very useful. My answer for this post: Generate multivariate normal r.v.'s with rank-deficient covariance via Pivoted Cholesky Factorization gives a more comprehensive picture.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248