I presented a similar post earlier this week (For loop vectorization) and am very appreciative of the answer provided. The current question follows up on the use of the data.table
package and its ability to cycle through the groups of two data.tables to perform a specific calculation.
Background:
I recreated the data following the same formatting as before and included the answer provided by @jlhoward. In this example, I have dt1
which consists of subjects and their measured values (values
) grouped by group
and time
. Additionally, dt2
has the measured values of only one subject which happens to be a reference with some known status.
The group_mean
function, when applied, cycles through each subject's measured values by group
and time
and compares them to the measured values of the reference subject in dt2
. The p_value
will then tell us how different/similar the variance of the measured values are between each other.
> library(data.table)
> library(lawstat)
> dt1 <- data.table(group = factor(rep(LETTERS[1:3], c(20,20,20))),
+ time = factor(rep(1:5, 12)),
+ values = runif(60, min = 0, max = 1),
+ key = c("group", "time"))
> dt2 <- data.table(group = factor(rep(LETTERS[4], 15)),
+ values = runif(15, min = 0, max = 1),
+ key = "group")
#Provided by @jlhoward#
> group_mean <- function(values)
+ {
+ example <- data.table(group = rep(c("T", "C"), c(length(values), nrow(dt2))),
+ values = c(values, dt2$values))
+ with(example, levene.test(values, group)$p.value)
+ }
> dt1[, list(p_value = group_mean(values)), by = c("group", "time")]
group time p_value
1: A 1 0.009812081
2: A 2 0.840368463
3: A 3 0.883976812
4: A 4 0.961928210
5: A 5 0.132638244
6: B 1 0.637280622
7: B 2 0.867169067
8: B 3 0.702991461
9: B 4 0.899523194
10: B 5 0.570103537
11: C 1 0.100202309
12: C 2 0.287019659
13: C 3 0.617098238
14: C 4 0.069866914
15: C 5 0.103481549
Problem:
There may be a point in time where the status of other subjects is determined and their measured values are consequently added to dt2
(represented with the following code for dt3
).
> dt3 <- data.table(group = factor(rep(LETTERS[4:6], c(15,15,15))),
+ values = c(dt2$values, runif(30, min = 0, max = 1)),
+ key = "group")
Question:
How could I modify the group_mean
function as to perform this variance test on the measured values between each subject in dt1
(by group
and time
) and each subject in dt3
(by group
)?
While I want to subset each subject from dt3
with a for
loop, I am attempting to better understand data.table
and its syntax. An example of the desired output is below with the p_value
intentionally left blank.
group time ref_group p_value
1: A 1 D
2: A 2 D
3: A 3 D
4: A 4 D
5: A 5 D
6: A 1 E
7: A 2 E
8: A 3 E
9: A 4 E
10: A 5 E
11: A 1 F
12: A 2 F
13: A 3 F
14: A 4 F
15: A 5 F
16: B 1 D
17: B 2 D
18: B 3 D
19: B 4 D
20: B 5 D
21: B 1 E
22: B 2 E
23: B 3 E
24: B 4 E
25: B 5 E
26: B 1 F
27: B 2 F
28: B 3 F
29: B 4 F
30: B 5 F
31: C 1 D
32: C 2 D
33: C 3 D
34: C 4 D
35: C 5 D
36: C 1 E
37: C 2 E
38: C 3 E
39: C 4 E
40: C 5 E
41: C 1 F
42: C 2 F
43: C 3 F
44: C 4 F
45: C 5 F
As always, thank you for your time and input/comments.