1

I have a data frame with measurements made by different raters, and I want to calculate the correlation of measurements between raters.

Here's my current implementation with dummy data:

set.seed(123)
df <- data.table(
groups = rep(seq(1, 4, 1),100),
measurement = runif(400)
)

cormat <- matrix(ncol=length(unique(df$groups)), nrow=length(unique(df$groups)))

for (i in unique(df$groups)){
    for (j in unique(df$groups)){

    cormat[i,j] <- cor(df[groups==i,]$measurement, df[groups==j,]$measurement)
}}

I hate the nested loop above, and would like to preferably find a dplyr/tidyverse approach my problem.

The expected output is:

> cormat
           [,1]        [,2]        [,3]        [,4]
[1,]  1.0000000 -0.10934904 -0.15159825  0.13237094
[2,] -0.1093490  1.00000000 -0.04278137 -0.02945215
[3,] -0.1515983 -0.04278137  1.00000000  0.04203516
[4,]  0.1323709 -0.02945215  0.04203516  1.00000000

(apologies if this question has been asked before, I was struggling to find a good search term)

Otto Kässi
  • 2,943
  • 1
  • 10
  • 27
  • 1
    Possible duplicate of [Correlation between groups in R data.table](https://stackoverflow.com/questions/22421542/correlation-between-groups-in-r-data-table) – Koot6133 Nov 22 '17 at 12:48
  • Thanks for your comment! This is useful, but I find the tidyverse approach below more elegant. – Otto Kässi Nov 22 '17 at 14:43

1 Answers1

3

Here is a tidyverse approach.

library(tidyverse)
df %>% 
 arrange(groups) %>% 
 add_column(index = rep(1:100, times = 4)) %>% 
 spread(groups, measurement) %>% 
 select(-index) %>% 
 cor()

Result

           1           2           3           4
1  1.0000000 -0.10934904 -0.15159825  0.13237094
2 -0.1093490  1.00000000 -0.04278137 -0.02945215
3 -0.1515983 -0.04278137  1.00000000  0.04203516
4  0.1323709 -0.02945215  0.04203516  1.00000000

We need the index column to have unique identifiers in order to spread the data.


edit

A base R approach might be

cor(unstack(df, measurement ~ groups))
markus
  • 25,843
  • 5
  • 39
  • 58