This is Henrik's answer (and if they come back, I'll happy give this answer to them ... somehow):
dat[, res := .(Reduce(c, j, accumulate=TRUE)), by = gr]
# j gr res
# <num> <num> <list>
# 1: 3 9 3
# 2: 8 9 3,8
# 3: 9 9 3,8,9
# 4: 11 9 3, 8, 9,11
# 5: 10 10 10
# 6: 28 10 10,28
Reduce
is similar to sapply
except that it operates on the current value and results of the previous operation. For instance, we can see
sapply(1:3, function(z) z*2)
# [1] 2 4 6
This, unrolled, equates to
1*2 # 2
2*2 # 4
3*2 # 6
That is, the calculation on one element of the vector/list is completely independent, never knowing the results from previous iterations.
However, Reduce
is explicitly given the results of the previous calculation. By default, it will only return the last calculation, which would be analogous to tail(sapply(...), 1)
:
Reduce(function(prev, this) prev + this*2, 11:13)
# [1] 61
That seems a bit obscure ... let's look at all of the interim steps, where the answer above is the last:
Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
# [1] 11 35 61
In this case (without specifying init=
, wait for it), the first result is just the first value in x=
, not run through the function. If we unroll this, we'll see
11 # 11 is the first value in x
_________/
/
v
11 + 12*2 # 35
35 + 13*2 # 61
Sometimes we need the first value in x=
to be run through the function, with a starting condition (a first-time value for prev
when we don't have a previous iteration to use). For that, we can use init=
; we can think of the use of init=
by looking at two perfectly-equivalent calls:
Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
Reduce(function(prev, this) prev + this*2, 12:13, init = 11, accumulate = TRUE)
# [1] 11 35 61
(Without init=
, Reduce will take the first element of x=
and assign it to init=
and remove it from x=
.)
Now let's say we want the starting condition (injected "previous" value) to be 0, then we would do
Reduce(function(prev, this) prev + this*2, 11:13, init = 0, accumulate = TRUE)
# [1] 0 22 46 72
### unrolled
0 # 0 is the init= value
________/
/
v
0 + 11*2 # 22
22 + 12*2 # 46
46 + 13*2 # 72
Let's bring that back to this question and this data. I'll inject a browser()
and change the function a little so that we can look at all intermediate values.
> dat[, res := .(Reduce(function(prev, this) { browser(); c(prev, this); }, j, accumulate=TRUE)), by = gr]
Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 2
[1] 3
Browse[2]> this
[1] 8
Browse[2]> c(prev, this)
[1] 3 8
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 3
[1] 3 8
Browse[2]> this
[1] 9
Browse[2]> c(prev, this)
[1] 3 8 9
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 4
[1] 3 8 9
Browse[2]> this
[1] 11
Browse[2]> c(prev, this)
[1] 3 8 9 11
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=10`, row 6
[1] 10
Browse[2]> this
[1] 28
Browse[2]> c(prev, this)
[1] 10 28
Browse[2]> c # 'c'ontinue
Notice how we didn't "see" rows 1 or 5, since they were the init=
conditions for the reduction (the first prev
value seen in each group).
Reduce
can be a difficult function to visualize and work with. When I use it, I almost always pre-insert browser()
into the anon-function and walk through the first three steps: the first to ensure the init=
is correct, the second to make sure the anon-function is doing what I think I want with the init and next value, and the third to make sure that it continues properly. This is similar to "Proof by Deduction": the n
th calc will be correct because we know the (n-1)th
calc is correct.