I have two data frames, ord
with 15k rows
uniq.id prev.ord.dt cur.ord.dt
<chr> <date> <date>
1 4892267119791 2016-04-28 2016-06-09
2 7174853145105 2016-04-26 2016-05-10
3 9600318129252 2016-07-07 2016-07-11
4 7150993150290 2016-03-30 2016-04-13
5 3121040102603 2016-05-01 2016-05-18
6 4899102101891 2016-04-29 2016-05-08
7 7174853112259 2016-05-10 2016-05-24
8 4648283132540 2016-04-25 2016-05-09
9 9600318144830 2016-05-12 2016-05-19
10 9600318134838 2016-05-23 2016-06-06
...
and dlvrd
. with 39k rows
uniq.id dlvrd.dt dlvrd.qty
<chr> <date> <int>
1 9600318114229 2016-02-24 10
2 2594775116151 2016-04-07 22
3 4935357145929 2016-05-26 6
4 4127487134929 2016-05-18 9
5 4935357144169 2016-05-10 62
6 1180975135573 2016-03-16 11
7 3121040102604 2016-06-17 22
8 1580171115586 2016-01-12 240
9 3011291147292 2016-03-25 12
10 4892267115584 2016-05-16 21
...
EDIT: There are about 2k total of the uniq.id
, and each one has multiple delivery periods under it (assume for simplicity that a delivery period is indicated by cur.ord.dt
). I need to aggregate the total quantity that was delivered for each uniq.id
for each delivery period, but the method I'm currently using is taking ~35 minutes to run. I defined a custom function
add.dlvrd <- function(uniq, prev, cur) {
require(dplyr)
dlvrd <- get("dlv.data", envir = .GlobalEnv)
dlvrd <- dlvrd %>% filter(uniq.id==uniq,
dlvrd.dt >= prev,
dlvrd.dt < cur) %$% sum(dlvrd.qty) %>% return()
}
and then run mapply
ord$dlvrd.qty <- ord %$% mapply(add.dlvrd, uniq.id, prev.ord.dt, cur.ord.dt)
Is there a more elegant way of doing this?
Side note: I'm aware that it's bad form to use "." in variable and function names instead of "_", but I don't have time to change it right now.