13

I have two tables that I need to do a sumif across. Table 1 contains time periods, i.e. year and quarter at year end (i.e. 4, 8, 12 etc.). Table 2 contains the transactions during the year at quarters 3, 6, 7 etc.

I need Table 3 to sum all the transactions during the year so that I get the cumulative position at year end.

Here's some sample code to explain what the data looks like and what the output should look like:

library(data.table)

x1 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36))
x2 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24), 
                 "Amount" = c(10000, 15000, -2500, 3500, -6500, 25000, 
                              11000, 9000, 7500))
x3 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36), 
                 "Amount" = c(10000, 22500, 19500, 55500, 64500, 72000, 
                              72000, 72000, 72000))

I've tried merge, summarise, foverlaps but can't quite figure it out.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
kodfather
  • 161
  • 8

1 Answers1

13

Nice question. What basically you are trying to do is to join by Name, Year and Quarter <= Quarter, while summing all the matched Amount values. This is both possible using the new non-equi joins (which were introduced in the latest stable version of data.table v-1.10.0) and foverlaps (while the latter will be probably sub-optimal)

Non-Equi joins:

x2[x1, # for each value in `x1` find all the matching values in `x2`
   .(Amount = sum(Amount)), # Sum all the matching values in `Amount`
   on = .(Name, Year, Quarter <= Quarter), # join conditions
   by = .EACHI] # Do the summing per each match in `i`
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000

As a side note, you can easily add Amount in place in x1 (proposed by @Frank):

x1[, Amount := 
  x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
]

This might be convenient if you have more than just the three join columns in that table.


foverlaps:

You mentioned foverlaps, so in theory you could achieve the same using this function too. Though I'm afraid you will easily get out of memory. Using foverlaps, you will need to create a huge table where each value in x2 joined multiple times to each value in x1 and store everything in memory

x1[, Start := 0] # Make sure that we always join starting from Q0
x2[, Start := Quarter] # In x2 we want to join all possible rows each time 
setkey(x2, Name, Year, Start, Quarter) # set keys
## Make a huge cartesian join by overlaps and then aggregate
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Thanks so much - I just got this working. Very much appreciated! It looks like both of my tables need to have the same columns in them. If x2 had one additional column that I didn't want included in the resulting table x3 , would the code be the same? – kodfather Jan 19 '17 at 14:23
  • You can specify in the `on` argument what ever column names you want from both tables. For instance `.on(column1 = column2, column3 = column4)`, etc. The LHS of the suasion are columns from `x1` while the RHS of the equation are columns from `x2`. – David Arenburg Jan 19 '17 at 14:24