I would like to perform a three-way principal component analysis in R, and though I have found a few articles explaining how it works and how to interpret results, I cannot find any useful guides online on how to do it in R.
My data consists of 230 samples, 250,000 variables and 50 annotations. Usually people just do a standard PCA using only one annotation on the following type of data:
Standard data:
var1 var2 var3 var4
Sample1 1/1 0/0 1/1 1/0
Sample2 1/0 1/1 1/1 1/0
Sample3 0/0 1/1 1/1 1/1
Sample4 0/0 0/0 1/1 0/0
Sample5 1/0 1/0 0/0 1/1
However, I would like to implement all annotation information into the analysis, such that I use all 50 matrices combindly for the analysis. In this way a combination of annotations may explain more of the variance between samples than a single annotation does alone, e.g. annotation 1 and 4 together explain more variance that annotation 1 alone.
Annotation 1:
var1 var2 var3 var4
Sample1 1/1 0/0 1/1 1/0
Sample2 1/0 1/1 1/1 1/0
Sample3 0/0 1/1 1/1 1/1
Sample4 0/0 0/0 1/1 0/0
Sample5 1/0 1/0 0/0 1/1
Annotation 2:
var1 var2 var3 var4
Sample1 missense none STOP synonymous
Sample2 missense missense STOP synonymous
Sample3 none missense STOP synonymous
Sample4 none none STOP none
Sample5 missense missense none synonymous
Annotation 3:
var1 var2 var3 var4
Sample1 0.30 0.00 0.01 0.04
Sample2 0.30 -0.24 0.01 0.04
Sample3 0.00 -0.24 0.01 0.04
Sample4 0.00 -0.24 0.01 0.00
Sample5 0.30 -0.24 0.00 0.04
Annotation 4:
var1 var2 var3 var4
Sample1 CTCF NONE NONE MAX
Sample2 CTCF NONE NONE MAX
Sample3 NONE NONE NONE MAX
Sample4 NONE NONE NONE NONE
Sample5 CTCF NONE NONE MAX
From what I have found there are three packages that can do the Tucker 3-way PCA: ThreeWay, PTAk and rTensor. I have tried to run ThreeWay, but the data structure that they use seems very ugly to work with. Maybe I could make this work, but the example in the ThreeWay article also generated an error, so I would prefer another package:
ThreeWay data structure:
var1_anno1 var1_anno2 var1_anno3 var2_anno1 var2_anno2
Sample1 1/1 missense 0.30 0/0 missense
Sample2 1/0 missense 0.30 1/1 missense
Sample3 0/0 none 0.30 1/1 missense
Sample4 0/0 none 0.30 0/0 none
Sample5 1/0 missense 0.30 1/0 missense
The PTAk packages requires: "a tensor (as an array) of order k, if non-identity metrics are used X is a list with data as the array and met a list of metrics"
It is not clear to me what this means. I tried to look into the tensor packages of how to generate a tensor, but their example is very convoluted as they do tons of multiplications on various tensors, rather than explain the basics of how to create a tensor form ones data.
I would appreciate both comments on weaknesses of this approach and on how to create tensors, and how to analyse them using any of the packages.
Thanks