An apparently simple problem: I want to generate 2 (simulated) variables (x, y) from a bivariate distribution with a given matrix of correlation between them. In other wprds, I want two variables/vectors with values of either 0 or 1, and a defined correlations between them.
The case of normal distribution is easy with the MASS package.
df_norm = mvrnorm(
100, mu = c(x=0,y=0),
Sigma = matrix(c(1,0.5,0.5,1), nrow = 2),
empirical = TRUE) %>%
as.data.frame()
cor(df_norm)
x y
x 1.0 0.5
y 0.5 1.0
Yet, how could I generate binary data from the given matrix correlation?
This is not working:
df_bin = df_norm %>%
mutate(
x = ifelse(x<0,0,1),
y = ifelse(y<0,0,1))
x y
1 0 1
2 0 1
3 1 1
4 0 1
5 1 0
6 0 0
7 1 1
8 1 1
9 0 0
10 1 0
Although this creates binary variables, but the correlation is not (even close to) 0.5.
cor(df_bin)
x y
x 1.0000000 0.2994996
y 0.2994996 1.0000000
Ideally I would like to be able to specify the type of distribution as an argument in the function (as in the lm() function).
Any idea?