1

I need to apply a corr.test on all my 4 variables based on a sliding window of 4 and step 1. I could do it for 2 variables at a time, is it possible to do it on 4 variables?

library("gtools")
library("psych")
pos <- subs$POS
result <- running(subs[,2], subs[,3], fun=corr.test,  width=4, by=1)
n <-length(pos)
intervals <- paste(pos[1:(n-3)],pos[4:n], sep =':')
names(result) <- intervals

subs <- structure(list(POS = c(92, 92, 94, 94, 101, 103, 105, 106, 107, 
113, 114, 120, 120, 122, 132, 136, 140, 143, 143, 146), var1 = c(1, 
6, 1, 1, 3, 1, 1, 2, 10, 1, 1, 5, 1, 1, 5, 8, 2, 1, 1, 17), var2 = c(11355.6578947368, 
11355.6578947368, 11347.8157894737, 11347.8157894737, 11450.0789473684, 
11478, 11473.1052631579, 11479.7105263158, 11495.1315789474, 
11642.6052631579, 11691.2631578947, 11612.0526315789, 11612.0526315789, 
11497.7368421053, 11580.1578947368, 11421.6842105263, 11288.1052631579, 
11288.0263157895, 11288.0263157895, 11278), var3 = c(1.37171511940581, 
1.37171511940581, 1.38015533583543, 1.38015533583543, 1.38285457974173, 
1.38467221575929, 1.38728237171707, 1.38800712625898, 1.38790768043085, 
1.39036602407455, 1.39189385417378, 1.39598161236479, 1.39598161236479, 
1.39737128761462, 1.38302756354476, 1.37244652743204, 1.3632470398133, 
1.35191834216266, 1.35191834216266, 1.33779404050067), var4 = c(1.04285684958464, 
1.04285684958464, 1.04296690407379, 1.04296690407379, 1.04244689713763, 
1.04242150761792, 1.04233095775203, 1.0422836372936, 1.04225281778793, 
1.04210777961787, 1.04202037841291, 1.04165492323573, 1.04165492323573, 
1.04161860236611, 1.04065255428577, 1.03999715968399, 1.03927165790059, 
1.03890373057753, 1.03890373057753, 1.03861585865751)), row.names = c(NA, 
20L), class = "data.frame")

If I try this:

result <- running(subs[,2:5], fun=corr.test, 
                   width=4, by=1)

I get this error:

Error in seq.default(from = 1, to = length(run.elements), by = by) : 
  wrong sign in 'by' argument
user3224522
  • 1,119
  • 8
  • 19
  • I think this post may be helpful to you: https://stackoverflow.com/questions/5446426/calculate-correlation-for-more-than-two-variables/38659933 – Anoushiravan R Apr 15 '21 at 10:01
  • I was actually able to do corr.test(subs[,2:ncol(subs]) and it works, the problem I see here are my sliding windows and application of the corr.test to all columns, may be some apply function? – user3224522 Apr 15 '21 at 10:05
  • You would like to apply in on var1 to var4? – Anoushiravan R Apr 15 '21 at 10:13
  • exactly, since the first column is position, I would like to apply it to all dataset except 1st column.. – user3224522 Apr 15 '21 at 10:14
  • I have found a package with regard to calculating sliding window with particular number, again I'm not sure if this is exactly what you have in mind cause I've never done it before. Just check it please there might be something useful: https://cran.r-project.org/web/packages/RolWinMulCor/RolWinMulCor.pdf – Anoushiravan R Apr 15 '21 at 10:52
  • 1
    I do not get an error! Your code is working on my machine. – TarJae Apr 15 '21 at 10:55
  • does it work if you add result <- running(subs[,2:5], fun=corr.test, width=4, by=1, method="pearson", use = "pairwise.complete.obs", adjust ="BH") ? – user3224522 Apr 15 '21 at 11:03
  • @user3224522 yes it also works for me, both cases. – Anoushiravan R Apr 15 '21 at 11:32
  • the output is wrong, I checked – user3224522 Apr 15 '21 at 12:27

2 Answers2

2

Define a function f that outputs a plain vector of whatever statistics that you want, e.g. p.adj, and then use rollapplyr with by.column = FALSE. (If the output of f were to have length 1 then replace rownames with names.)

library(zoo)

f <- function(x) corr.test(x)$p.adj # change this to whatever you want
r <- rollapplyr(subs[-1], 4, f, by.column = FALSE)
rownames(r) <- intervals  # intervals defined in question
r

giving this matrix whose rows are intervals and whose columns are statistics:

             [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
92:94   1.00000000 1.000000000 0.00000000 1.00000000 0.00000000 0.00000000
92:101  1.00000000 1.000000000 1.00000000 1.00000000 0.05655636 1.00000000
94:103  1.00000000 1.000000000 0.07544652 1.00000000 0.05425068 0.18885878
94:105  1.00000000 1.000000000 0.56802781 1.00000000 0.09806659 0.56802781
101:106 1.00000000 1.000000000 1.00000000 1.00000000 1.00000000 0.14343590
103:107 0.16535559 1.000000000 1.00000000 1.00000000 1.00000000 0.21105126
105:113 1.00000000 1.000000000 0.09584584 1.00000000 0.11065442 0.09584584
106:114 1.00000000 1.000000000 0.05139145 1.00000000 0.03790992 0.04056634
107:120 0.18617738 1.000000000 1.00000000 1.00000000 1.00000000 0.04339496
113:120 1.00000000 1.000000000 1.00000000 1.00000000 1.00000000 0.01435381
114:122 1.00000000 1.000000000 0.52583100 1.00000000 0.82197327 0.10782051
120:132 1.00000000 1.000000000 1.00000000 1.00000000 1.00000000 0.05212361
120:136 1.00000000 0.005975605 1.00000000 0.00346566 1.00000000 0.01157140
122:140 1.00000000 1.000000000 1.00000000 1.00000000 1.00000000 0.00542238
132:143 0.80881021 0.808810209 0.28263145 0.80881021 0.13299017 0.07881523
136:143 0.05864769 0.179665391 0.17966539 0.10236170 0.17437441 0.10236170
140:146 0.01366799 0.549918470 0.54991847 0.54991847 0.54991847 0.06677469
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • thanks a lot! where can I add method="pearson", use = "pairwise.complete.obs", adjust ="BH" ? – user3224522 Apr 15 '21 at 11:57
  • why di we have 6 columns in the output? If I print r values I get 4 values for each var... – user3224522 Apr 15 '21 at 12:07
  • I did read the answer, it wasnt clear why do we have 6 values for p.adj and what each column stands for. – user3224522 Apr 15 '21 at 12:15
  • 1
    Presumably they are the adjusted p values for each of the 6 pairs of columns. I am assuming that is the most likely set of statistics that you are after but you are free to define f in any way you like to get any statistics you want. In ?corr.test it says they correspond to the upper triangular part of an nxn matrix (here n = 4). See the description of the p output in ?corr.test. – G. Grothendieck Apr 15 '21 at 12:41
1
library(corrplot)
cor_subs <- cor(subs)
cor_subs

Output:

            POS       var1       var2       var3       var4
POS   1.0000000  0.3256234 -0.1769368 -0.5323549 -0.9662176
var1  0.3256234  1.0000000 -0.1890354 -0.4210894 -0.3612658
var2 -0.1769368 -0.1890354  1.0000000  0.8202677  0.4028399
var3 -0.5323549 -0.4210894  0.8202677  1.0000000  0.7257573
var4 -0.9662176 -0.3612658  0.4028399  0.7257573  1.0000000

With significance level:

rcorr(cor_subs, type = c("pearson","spearman"))

Output:

       POS  var1  var2  var3  var4
POS   1.00  0.70 -0.66 -0.86 -0.99
var1  0.70  1.00 -0.81 -0.88 -0.76
var2 -0.66 -0.81  1.00  0.94  0.75
var3 -0.86 -0.88  0.94  1.00  0.92
var4 -0.99 -0.76  0.75  0.92  1.00

n= 5 


P
     POS    var1   var2   var3   var4  
POS         0.1844 0.2242 0.0585 0.0010
var1 0.1844        0.1001 0.0475 0.1333
var2 0.2242 0.1001        0.0184 0.1420
var3 0.0585 0.0475 0.0184        0.0255
var4 0.0010 0.1333 0.1420 0.0255  

Adding a plot

corrplot(cor_subs, method="number", type="lower")

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Great solution. I think this is the one she/he was looking for. And also great visualization by the way. – Anoushiravan R Apr 15 '21 at 10:37
  • 1
    thanks a lot for this solution. But again the same problem, it does not take in consideration a sliding window based on the position, which is clearly shown in my code above. – user3224522 Apr 15 '21 at 10:38
  • In your first not edited edition of your question that was not so clear for me. Sorry for that. – TarJae Apr 15 '21 at 10:42
  • 1
    sorry, my fault, i put it in the code but did not spicfy in the question. – user3224522 Apr 15 '21 at 10:46