Calculations by group between two data.tables in R

Question

I presented a similar post earlier this week (For loop vectorization) and am very appreciative of the answer provided. The current question follows up on the use of the data.table package and its ability to cycle through the groups of two data.tables to perform a specific calculation.

Background:

I recreated the data following the same formatting as before and included the answer provided by @jlhoward. In this example, I have dt1 which consists of subjects and their measured values (values) grouped by group and time. Additionally, dt2 has the measured values of only one subject which happens to be a reference with some known status.

The group_mean function, when applied, cycles through each subject's measured values by group and time and compares them to the measured values of the reference subject in dt2. The p_value will then tell us how different/similar the variance of the measured values are between each other.

> library(data.table)
> library(lawstat)

> dt1 <- data.table(group = factor(rep(LETTERS[1:3], c(20,20,20))),
+                   time = factor(rep(1:5, 12)),
+                   values = runif(60, min = 0, max = 1),
+                   key = c("group", "time"))
> dt2 <- data.table(group = factor(rep(LETTERS[4], 15)),
+                   values = runif(15, min = 0, max = 1),
+                   key = "group")

#Provided by @jlhoward#
> group_mean <- function(values)
+ {
+     example <- data.table(group = rep(c("T", "C"), c(length(values), nrow(dt2))),
+                           values = c(values, dt2$values))
+     with(example, levene.test(values, group)$p.value)
+ }

> dt1[, list(p_value = group_mean(values)), by = c("group", "time")]
    group time     p_value
 1:     A    1 0.009812081
 2:     A    2 0.840368463
 3:     A    3 0.883976812
 4:     A    4 0.961928210
 5:     A    5 0.132638244
 6:     B    1 0.637280622
 7:     B    2 0.867169067
 8:     B    3 0.702991461
 9:     B    4 0.899523194
10:     B    5 0.570103537
11:     C    1 0.100202309
12:     C    2 0.287019659
13:     C    3 0.617098238
14:     C    4 0.069866914
15:     C    5 0.103481549

Problem:

There may be a point in time where the status of other subjects is determined and their measured values are consequently added to dt2 (represented with the following code for dt3).

> dt3 <- data.table(group = factor(rep(LETTERS[4:6], c(15,15,15))),
+                   values = c(dt2$values, runif(30, min = 0, max = 1)),
+                   key = "group")

Question:

How could I modify the group_mean function as to perform this variance test on the measured values between each subject in dt1 (by group and time) and each subject in dt3 (by group)?

While I want to subset each subject from dt3 with a for loop, I am attempting to better understand data.table and its syntax. An example of the desired output is below with the p_value intentionally left blank.

    group time ref_group p_value
 1:     A    1         D        
 2:     A    2         D        
 3:     A    3         D        
 4:     A    4         D        
 5:     A    5         D        
 6:     A    1         E        
 7:     A    2         E        
 8:     A    3         E        
 9:     A    4         E        
10:     A    5         E        
11:     A    1         F        
12:     A    2         F        
13:     A    3         F        
14:     A    4         F        
15:     A    5         F        
16:     B    1         D        
17:     B    2         D        
18:     B    3         D        
19:     B    4         D        
20:     B    5         D        
21:     B    1         E        
22:     B    2         E        
23:     B    3         E        
24:     B    4         E        
25:     B    5         E        
26:     B    1         F        
27:     B    2         F        
28:     B    3         F        
29:     B    4         F        
30:     B    5         F        
31:     C    1         D        
32:     C    2         D        
33:     C    3         D        
34:     C    4         D        
35:     C    5         D        
36:     C    1         E        
37:     C    2         E        
38:     C    3         E        
39:     C    4         E        
40:     C    5         E        
41:     C    1         F        
42:     C    2         F        
43:     C    3         F        
44:     C    4         F        
45:     C    5         F

As always, thank you for your time and input/comments.

Calculations by group between two data.tables in R

0 Answers0