0

I have the following dataset

    structure(list(Time = c(0L, 0L, 0L, 0L, 0L, 200L, 200L, 200L, 
200L, 200L, 400L, 400L, 400L, 400L, 400L), AgentID = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AC = c("c", 
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", 
"c"), Layer = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), Type = c("b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b"), Data = c(0, 0, 0, 0, 0, 0.117073864, 
0.13028602, 0.11111003, -0.11538354, 0.07852934, 0.24280901, 
0.24271743, 0.21535376, -0.2213944, 0.23355752), SimulationID = c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), discountFactor = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), N = c(80L, 
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 
80L)), row.names = c(NA, -15L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x55843aed0a70>)

What I want to do is to group the rows with with on=.(Layer,Type,AC,AgentID,Time,SimulationID,N,discountFactor), and do nodeVelocity:=Data-oldData where oldData is the Data column having oldTime=Time-200.

I tried

d[d[,.(Time=Time+200,Data), by=.(Layer,Type,AC,AgentID,SimulationID,N,discountFactor)], 
  `:=` (nodeVelocity=x.Data-i.Data, iData=i.Data)
  ,on=.(Layer,Type,AC,AgentID,Time = Time,SimulationID,N,discountFactor)]

However, in the above code, as you can see from its iData column, the subtraction is done via the last row of i.Data (i.Data =: oldData). Why is this? I was expecting an element-wise vector substraction but what I got is vector minus the last row of the other vector.

Our
  • 986
  • 12
  • 22

1 Answers1

1

Here is an option:

DT[, c("ri", "T2") := .(rowid(rleid(Time, Layer, Type, AC, AgentID, SimulationID, N, discountFactor)),
    Time+200L)]
DT[DT, on=.(Time=T2, Layer, Type, AC, AgentID, SimulationID, N, discountFactor, ri),
    diff := x.Data - i.Data]

output:

    Time AgentID AC Layer Type        Data SimulationID discountFactor  N ri  T2        diff
 1:    0       1  c     0    b  0.00000000            4              0 80  1 200          NA
 2:    0       1  c     0    b  0.00000000            4              0 80  2 200          NA
 3:    0       1  c     0    b  0.00000000            4              0 80  3 200          NA
 4:    0       1  c     0    b  0.00000000            4              0 80  4 200          NA
 5:    0       1  c     0    b  0.00000000            4              0 80  5 200          NA
 6:  200       1  c     0    b  0.11707386            4              0 80  1 400  0.11707386
 7:  200       1  c     0    b  0.13028602            4              0 80  2 400  0.13028602
 8:  200       1  c     0    b  0.11111003            4              0 80  3 400  0.11111003
 9:  200       1  c     0    b -0.11538354            4              0 80  4 400 -0.11538354
10:  200       1  c     0    b  0.07852934            4              0 80  5 400  0.07852934
11:  400       1  c     0    b  0.24280901            4              0 80  1 600  0.12573515
12:  400       1  c     0    b  0.24271743            4              0 80  2 600  0.11243141
13:  400       1  c     0    b  0.21535376            4              0 80  3 600  0.10424373
14:  400       1  c     0    b -0.22139440            4              0 80  4 600 -0.10601086
15:  400       1  c     0    b  0.23355752            4              0 80  5 600  0.15502818

On your 2nd qn, it is because there are multiple x rows for each i row in the join. After iterating through the join results, the last row is used. See a recent post: Can someone explain how mult works in data.table when it performs update in joins (using .EACHI and mult)

chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • yes, there are multiple `i` rows for each `x` row. But I thought since I'm not using `.EACHI`, that would handle all such x's and i's at the same time. – Our Jul 07 '21 at 11:57