Here is an alternative approach which creates all unique combinations of TIME
, TYPE
, and duplicated GROUP
s through a cross join and then computes the correlation of SCORE
for the correspondings subsets of DATA
:
library(data.table) # development version 1.14.3 required
setDT(DATA, key = c("GROUP", "TYPE", "TIME"))[
, CJ(time = TIME, type = TYPE, groupA = GROUP, groupB = GROUP, unique = TRUE)][
groupA < groupB][
, corType := paste0("G", groupA, "G", groupB)][][
, corValue := cor(DATA[.(groupA, type, time), SCORE],
DATA[.(groupB, type, time), SCORE]),
by = .I][]
time type groupA groupB corType corValue
1: 100 1 1 2 G1G2 0.11523940
2: 100 1 1 3 G1G3 -0.05124326
3: 100 1 1 4 G1G4 -0.16943203
4: 100 1 2 3 G2G3 0.05475435
5: 100 1 2 4 G2G4 -0.10769738
6: 100 1 3 4 G3G4 0.01464146
7: 100 2 1 2 G1G2 NA
8: 100 2 1 3 G1G3 NA
9: 100 2 1 4 G1G4 NA
10: 100 2 2 3 G2G3 NA
11: 100 2 2 4 G2G4 NA
12: 100 2 3 4 G3G4 NA
13: 101 1 1 2 G1G2 NA
14: 101 1 1 3 G1G3 NA
15: 101 1 1 4 G1G4 NA
16: 101 1 2 3 G2G3 NA
17: 101 1 2 4 G2G4 NA
18: 101 1 3 4 G3G4 NA
19: 101 2 1 2 G1G2 -0.04997479
20: 101 2 1 3 G1G3 -0.02262932
21: 101 2 1 4 G1G4 -0.00331578
22: 101 2 2 3 G2G3 -0.01243952
23: 101 2 2 4 G2G4 0.16683223
24: 101 2 3 4 G3G4 -0.10556083
time type groupA groupB corType corValue
Explanation
DATA
is coerced to class data.table
while setting a key on columns GROUP
, TYPE
, and TIME
. Keying is required for fast subsetting later.
- The cross join
CJ()
creates all unique combinations of columns TIME
, TYPE
, GROUP
, and GROUP
(twice). The columns of the cross join have been renamed to avoid name clashes later on.
[groupA < groupB]
ensures that equivalent combinations of groupA
and groupB
only appear once, e.g., G2G1
is dropped in favour of G1G2
. So, this is kind of data.table version of t(combn(unique(DATA$GROUP), 2))
.
- A new column
corType
is append by reference.
- Finally, the groupwise correlations are computed by stepping rowwise through the cross join table (using
by = .I
) and subsetting DATA
by groupA
, type
, time
and groupB
, type
, time
, resp., using fast subsetting through keys
. Please, see the vignette Keys and fast binary search based subset for more details.
Note that by = .I
is a new feature of data.table
development version 1.14.3.
Combinations of time, type, and group which do not exist in DATA
will appear in the result set but are marked by NA
in column corValue
.
Data
set.seed(42) # required for reproducible data
DATA = data.frame("GROUP" = sort(rep(1:4, 200)),
"TYPE" = rep(1:2, 400),
"TIME" = rep(100:101, 400),
"SCORE" = sample(1:100, r=T, 800))