I have a data.table that looks like:
A <- c(1,3,5,20,21,21)
B <- c(1, 2, 3, 4, 5, 6)
C <- c("I","I","II","II","III","III")
D <- c(0.7, 0.3, 0.5, 0.9, 4, 7)
M <- data.table(A,B,C,D)
My question is similar to R help: divide values by sum produced through factor with a few extra considerations. A
specifies a date (I'm simply using integers here). B
are individuals. C
is a classification in the individual belongs to. D
is a value variable.
For each classification c
of C
, for each day a
of A
, divide the value D
by the sum of the values for all individuals in c
, carrying backward when needed such that 0<x-a<=N
where x is the date of another individual (meaning that we pick the smallest x-a and use that as an approximation for the value of the other individual in group c
on day a).
Let's say N=5. Here's my expected output.
A <- c(1,3,5,20,21,21)
B <- c(1, 2, 3, 4, 5, 6)
C <- c("I","I","II","II","III","III")
D <- c(0.7/(0.7+0.3), 0.3/(0.3), 0.5/(0.5), 0.9/(0.9), 4/(4+7), 7/(4+7))
M <- data.table(A,B,C,D)
Note that the values for group B are not carried backward for individual 3, as the length is greater than 5 (20-5). Is there a nice way of doing this in data.table
?
For each value in D, I wish to divide by the sum of all the values of the same group (either I, II,II) on that day. However, you'll notice for some groups, observations do not exist on that day. I'll try and walk through the logic on a few observations.
Edit: Let me try and walk through a few cases.
For individual 1 (column B) on day 1 (column A), the individual is of group I (column C). Other individuals of group I are: 2. For each of those others, we see that for individual 2, their nearest observation is on day 3 and 3-1<=5, so we'll use 0.3 in the denominator.
For individual 3 (column B) on day 5 (column A), the individual is of group II (column C). Other individuals of group II are: 3. For each of those others, we see that for individual 3, their nearest observation is on day 20 and 20-5>5, so we cannot use their observation in the denominator.