0

In my simulation study I need to come up with a covariance matrix for multivariate data. My data:

dataset=data.frame(observation=rep(1:8,2),plot=rep(1:4,each=2),time=rep(1:2,8),treatment=rep(c("A","B","A","B"),each=4),OutputVariable=rep(c("P","Q"),each=8))

This dataset is multivariate, for every observation (1:8) there is more than one result. In this case, we observe a value for OutputVariable P and for OutputVariable Q at the same time. Note that actual outputs are not in this dataset as I will generate them at a later stage.

The desired Covariance Matrix would be 16x16. Where CovarMat[2,9] indicates the Covariance between the second line (Observation 2 of variable P) and the 9th line (Observation 1 of variable Q) in the dataset.

The value of, for instance, CovarMat[2,9] is based on rules like these:

  • CovarMat[2,9]=0
  • If dataset$plot[2]==dataset$plot[9] then CovarMat[2,9]=CovarMat[2,9]+1.5
  • If dataset$time[2]==dataset$time[9] then CovarMat[2,9]=CovarMat[2,9]+1.5
  • If (dataset$plot[2]==dataset$plot[9])&(dataset$time[2]==dataset$time[9]) then CovarMat[2,9]=CovarMat[2,9]+3
  • If abs(dataset$time[2]-dataset$time[9])=1 then CovarMat[2,9]=CovarMat[2,9]+2

Using For-loops thats easy enough (and thats what I did up to now). But my current dataset is 13,200 lines. And thus my CovarMat consists of 174,240,000 cells. Therefore, I am in desperate need of a more efficient way.

Community
  • 1
  • 1
Nightingale
  • 233
  • 1
  • 10
  • Check out `Rcpp` and `RcppArmadillo` packages, latter uses the C++ library Armadillo. Although it may seem overkill, if you are not familiar with C++, it may save you time in the long-run. Both packages, and Armadillo, are well-supported and documented with active developers using both/all. There are various questions on here regarding large matrices and RcppArmadillo (http://stackoverflow.com/questions/18866130/passing-large-matrices-to-rcpparmadillo-function-without-creating-copy-advanced). Also look at `sparse_matrix_type` via Armadillo docs http://arma.sourceforge.net/docs.html – Rusan Kax Sep 01 '14 at 15:36
  • Thank you for your suggestion. But it does seem overkill to me. Especially since the size of the matrix itself does not result in major problems. – Nightingale Sep 02 '14 at 13:15
  • You'll need to provide more info on how you want your cov matrix to be structured. – Hong Ooi Sep 02 '14 at 13:46

0 Answers0