How can I simulate a data frame with variables that correlate to each other?

Question

I am R-newbie and created a dataframe, in which I gave every product a invented probability:

set.seed(10)
data <- data.frame(orderId=sample(c(1:10000), 100000, replace=TRUE),
                   product=sample(c('P1','P2','P3','P4','P5','P6','P7','P8','P9','P10', 'P11','P12','P13','P14','P15',
                                    'P16','P17','P18','P19','P20',
                                    'z1','z2','z3','z4','z5','z6','z7','z8','z9','z10','z11','z12','z13','z14','z15',
                                    'z16','z17','z18','z19','z20','z21','z22','z23','z24','z25','z26','z27','z28',
                                    'z29','z30','z31','z32','z33','z34','z35','z36','z37','z38','z39','z40')
                                  ,100000, replace=TRUE,
                                  prob=c(0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
                                         0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03,
                                         0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
                                         0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03,
                                         0.02, 0.03, 0.01, 0.015, 0.023, 0.027, 0.009, 0.013, 0.04, 0.006,
                                         0.018, 0.013, 0.025, 0.011, 0.003, 0.007, 0.02, 0.014, 0.01, 0.03)))

Is it possible to simulate them, that some variables have a correlation to each other (e.g. P1, P4, P8, z1 and z3 have a high correlation). I need this to run a factor analysis in R? Thanks.

Here's a question about making efficient draws from a multivariate normal distribution. [Possible dupe?](http://stackoverflow.com/q/22738355/903061) — Gregor Thomas, Feb 08 '17 at 23:52
What Gregor said. Check out this page: http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/mvrnorm.html. To convert from the normal distribution to a uniform probability, you can then calculate quantiles. — thc, Feb 08 '17 at 23:54
I would say use your probs as the means, use `prob * (1 - prob)` as the variances, and set whatever covariances you'd like. — Gregor Thomas, Feb 08 '17 at 23:56
Possible duplicate of [Efficiently radomly drawing from a multivariate normal distribution](http://stackoverflow.com/questions/22738355/efficiently-radomly-drawing-from-a-multivariate-normal-distribution) — nrussell, Feb 09 '17 at 00:03
Thanks for the fast support, I visit the links. I understand the basic Principe, but I have no idea how to implement this in my code. Have anyone maybe a hint? Sorry if this is to impolite to ask. — Marre, Feb 09 '17 at 00:21
are you trying to simulate products purchased in customers' orders? — chinsoon12, Feb 09 '17 at 01:45

score 0 · Answer 1 · answered Feb 09 '17 at 02:24

How is this:

# How much sample data
amountofsample <- 100

# create a linear var
a <- 101: eval( 100+amountofsample)

# randomly sample a multiplication factor in a narrow range (tighter the range the closer the corrleation will be)
b<- sample(  seq( 1 , 1.1, .01 ), amountofsample , replace  =T )

# multiply the orginal value by the random number
f <- a * b 

# create a data.frame with both simulated columns
a <- data.frame( a , f )

How can I simulate a data frame with variables that correlate to each other?

1 Answers1