1

I have two data tables. The first, DT1, is grouped by ID. It has a Week column that contains an instance of every consecutive week in a large time frame for every ID. It also contains a Units column. It looks something like this simplified version:

ID        Week        Units
A         1/1/2019    5
A         1/8/2019    9
A         1/15/2019   0
A         1/22/2019   1
B         1/1/2019    1
B         1/8/2019    32
B         1/15/2019   2
B         1/22/2019   6
C         1/1/2019    0
C         1/8/2019    0
C         1/15/2019   8
C         1/22/2019   3

The second table, DF2, has a time range window for every ID. This is represented by a start date column and a stop date column. It looks something like this simplified version:

ID         Start Date         Stop Date
A          1/1/2019           1/8/2019
B          1/8/2019           1/22/2019
C          1/8/2019           1/15/2019

I would like to calculate the sum of the units column for every date range/buyer combination in DF2. My desired output would be:

ID         Start_Date         Stop_Date       sumUnits
A          1/1/2019           1/8/2019        14
B          1/8/2019           1/22/2019       40
C          1/8/2019           1/15/2019       8

Is there a way to calculate this type of sum in R?

I have referenced the article, In R: how to sum a variable by group between two dates , as well as attempted the interval function and a non-equi join.

DT1[DT2[DT1, sum(x), on = .(Units, Week>= Stop_Date, Week<= Stop_Date),
                by = .EACHI], newvar := V1, on = .(Units, Start_Date 
                =Week)]

It returns the message, "Week" not in i. How do I solve the problem?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
etodd
  • 15
  • 3
  • 1
    r u looking for `DT[DT2, on=.(ID, Week>=Start_Date, Week<=Stop_Date), sum(Units), by=.EACHI]` ? – chinsoon12 Aug 01 '19 at 05:47
  • Next time, please provide example data in a reproducible format with proper date formatting like A.S.K.'s answer below (with `DF1 = `...). Guidance available here if you're interested: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Aug 01 '19 at 17:11

2 Answers2

1

If you don't mind using dplyr instead of data.table, you can use the fuzzyjoin package:

library(dplyr)
library(fuzzyjoin)

DF1 = data.frame(
  ID = c(rep("A", 4), rep("B", 4), rep("C", 4)),
  Week = rep(as.Date(c("1/1/2019", "1/8/2019", "1/15/2019", "1/22/2019"), format = "%m/%d/%Y")),
  Units = c(5, 9, 0, 1, 1, 32, 2, 6, 0, 0, 8, 3)
)
DF2 = data.frame(
  ID = c("A", "B", "C"),
  Start.Date = as.Date(c("1/1/2019", "1/8/2019", "1/8/2019"), format = "%m/%d/%Y"),
  Stop.Date = as.Date(c("1/8/2019", "1/22/2019", "1/15/2019"), format = "%m/%d/%Y")
)

fuzzy_inner_join(
  DF1, DF2,
  by = c("ID", "Week" = "Start.Date", "Week" = "Stop.Date"),
  match_fun = list(`==`, `>=`, `<=`)
) %>%
  group_by(ID.x, Start.Date, Stop.Date) %>%
  summarize(sumUnits = sum(Units))
A. S. K.
  • 2,504
  • 13
  • 22
1

I have referenced the article, In R: how to sum a variable by group between two dates, as well as attempted the interval function and a non-equi join.

Here's a variation on @akrun's answer that works:

library(data.table)
setDT(DF1)
setDT(DF2)

DF2[, v := 
  DF1[DF2, on=.(ID, Week >= Start.Date, Week <= Stop.Date), sum(x.Units), by=.EACHI]$V1
]

   ID Start.Date  Stop.Date  v
1:  A 2019-01-01 2019-01-08 14
2:  B 2019-01-08 2019-01-22 40
3:  C 2019-01-08 2019-01-15  8

I am using the input data as created in @A.S.K.'s answer.

To see how it works, try running simpler parts of it:

  • DF1[DF2, on=.(ID, Week >= Start.Date, Week <= Stop.Date), sum(x.Units), by=.EACHI]$V1
  • DF1[DF2, on=.(ID, Week >= Start.Date, Week <= Stop.Date), sum(x.Units), by=.EACHI]
  • DF1[DF2, on=.(ID, Week >= Start.Date, Week <= Stop.Date)]
Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    I always have trouble wrapping my brain around data.table syntax, but this is beautifully concise. – A. S. K. Aug 01 '19 at 22:30