I have a matrix of 7358444 rows and 110 columns. The matrix is composed of caracter vectors and looks like this:
FORMAT eQTL188 eQTL193 eQTL178 eQTL179 eQTL238
[1,] "GT:DS:GP" "0/1:0.79:0.221,0.767,0.011" "0/0:0.031:0.97,0.03,0" "0/0:0.033:0.967,0.033,0" "0/0:0.079:0.922,0.077,0.001" "0/0:0.344:0.664,0.329,0.007"
[2,] "GT:DS:GP" "0/0:0.047:0.953,0.047,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.525:0.477,0.52,0.002"
[3,] "GT:DS:GP" "0/0:0.047:0.953,0.047,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.527:0.476,0.521,0.003"
[4,] "GT:DS:GP" "0/0:0.048:0.952,0.048,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.518:0.485,0.512,0.003"
I need to calculate for each one of my samples (the columns with the pattern eQTL) the dosage from allele1. This can be calculated using the GP values after the second :
in each one of the columns. The formula I need to apply is P(A1) = 2*P(A1/A1) + P(A1/A2)
, where P1 is the first element after the second :
, and A2 the second one.
The result (numeric matrix) that I am looking for would look like this
eQTL188 eQTL193 eQTL178 eQTL179 eQTL238
[1,] 1.209 1.970 1.967 1.921 1.657
[2,] 1.953 1.903 1.994 1.992 1.474
[3,] 1.953 1.993 1.994 1.992 1.473
[4,] 1.952 1.993 1.994 1.99 1.482
Since the matrix is quite huge, the speed could be an issue