Allocate along members of a group in data.table

Question

I have a table of Demand that looks something like this:

set.seed(1)
DTd <- data.table(loc="L1", product="P1", cust=c("C1","C2","C3"), period=c("per1","per2","per3","per4"), qty=runif(12,min=0,max=100), key=c("loc","product","cust","period"))
DTd[]
#   loc product cust period      qty
#1:  L1      P1   C1   per1 12.97134
#2:  L1      P1   C1   per2 65.37663
#3:  L1      P1   C1   per3 34.21633
#4:  L1      P1   C1   per4 24.23550
#5:  L1      P1   C2   per1 85.68853
#6:  L1      P1   C2   per2 98.22407
#7:  L1      P1   C2   per3 92.24086
#8:  L1      P1   C2   per4 70.62672
#9:  L1      P1   C3   per1 62.12432
#10:  L1      P1   C3   per2 84.08788
#11:  L1      P1   C3   per3 82.67184
#12:  L1      P1   C3   per4 53.63538

And a table of Supply that looks something like this:

DTs <- data.table(loc="L1", product="P1", period=c("per1","per2","per3","per4"), qty=runif(4,min=0,max=200), key=c("loc","product","period"))
DTs[]
#   loc product period       qty
#1:  L1      P1   per1   9.23293
#2:  L1      P1   per2  74.03622
#3:  L1      P1   per3 133.54770
#4:  L1      P1   per4 123.43913

I need to allocate the supply to the corresponding demand on a priority basis and add a column 'allocated' to the demand table. For purpose of this example we will assume that priority is by smallest demand first.

This is the outcome that I am looking for.

#loc product cust period      qty     alloc
#1:  L1      P1   C1   per1 12.97134  9.232930
#2:  L1      P1   C1   per2 65.37663 65.376625
#3:  L1      P1   C1   per3 34.21633 34.216329
#4:  L1      P1   C1   per4 24.23550 24.235499
#5:  L1      P1   C2   per1 85.68853  0.000000
#6:  L1      P1   C2   per2 98.22407  0.000000
#7:  L1      P1   C2   per3 92.24086 16.659531
#8:  L1      P1   C2   per4 70.62672 45.568249
#9:  L1      P1   C3   per1 62.12432  0.000000
#10:  L1      P1   C3   per2 84.08788  8.659591
#11:  L1      P1   C3   per3 82.67184 82.671841
#12:  L1      P1   C3   per4 53.63538 53.635379

I don't see the way to do this efficiently using the features of data.table. I seem to be reduced to looping through the rows and updating using set on a row by row basis. This is the code that I used in this case.

#set key on demand to match supply and order by the qty (for prioritising
setkey(DTd, loc, product, period, qty)
#add a column for the allocated quantity
DTd[,alloc:=0]
#loop through the rows of the supply, using the row number
for (s in DTs[, .I]) {
    key <- DTs[s, .(loc, product, period)]
    suppqty <- DTs[s, qty]
    #loop through the corresponding demand and return the row number
    for (d in DTd[key, which=TRUE]) {
        if (suppqty == 0) break
        #determine the quantity to allocate from the demand row
        allocqty <- DTd[d, ifelse(qty < suppqty, qty, suppqty)]
        #update the alloc qty on this row
        set(DTd, d, 6L, allocqty)
        #reduce the amount outstanding
        suppqty <- suppqty - allocqty
    }
}
#restore the original keys
setkey(DTd, loc, product, cust, period)

Any suggestions for a better way to achieve any part of this would be much appreciated. (In practice the tables are quite large and the priority rules can be quite complex, but in this case I would do a first pass to determine priority and then use this in the allocation pass).

Fyi, `ifelse(qty < suppqty, qty, suppqty)` can be computed as `pmin(suppqty, qty)`. — Frank, Apr 19 '16 at 06:08

score 3 · Accepted Answer · edited May 23 '17 at 12:31

You can do

setnames(DTs, "qty", "suppqty")
setnames(DTd, "qty", "demqty")
setorder(DTd, loc, product, period, demqty) # put your priority column last here

DTd[DTs, alloc := {
  resid_supply = shift(pmax(suppqty - cumsum(demqty), 0), fill=suppqty[1L])
  pmin(demqty, resid_supply)
}, by=.EACHI, on=c("loc", "product", "period")]

The result is

    loc product cust period    demqty     alloc
 1:  L1      P1   C2   per1 20.168193 20.168193
 2:  L1      P1   C1   per1 26.550866 26.550866
 3:  L1      P1   C3   per1 62.911404 62.911404
 4:  L1      P1   C1   per2  6.178627  6.178627
 5:  L1      P1   C2   per2 37.212390 37.212390
 6:  L1      P1   C3   per2 89.838968 33.429727
 7:  L1      P1   C2   per3 20.597457 20.597457
 8:  L1      P1   C3   per3 57.285336 57.285336
 9:  L1      P1   C1   per3 94.467527 76.085490
10:  L1      P1   C3   per4 17.655675 17.655675
11:  L1      P1   C2   per4 66.079779 66.079779
12:  L1      P1   C1   per4 90.820779 15.804394

You don't generally need to set the key before a merge these days, as described by one of the package authors, Arun, in this SO post:

In most cases therefore, there shouldn't be a need to set keys anymore. We recommend using on= wherever possible, unless setting key has a dramatic improvement in performance that you'd like to exploit.

For a similar computation (procurement by lowest-price priority), you can see my other answer.

Brilliant! Thanks @Frank. That has taught me two new things: 1. _shift_ (I had written my own version of this) 2. use of multiple statements in the {} of the assignment. I will post some performance comparisons when I apply to a larger table. — sch56, Apr 19 '16 at 20:52
Not an exact comparison, but for a table of 9723 demands times 13 periods (120383 rows in _melt_ed format): _For_ loop and _set_ 77.3 seconds user time. @Frank's construct 4.64 seconds user time. More than x16 faster. — sch56, Apr 21 '16 at 03:59

Allocate along members of a group in data.table

1 Answers1