2

Let's say we have two tables:

A table of budgets:

Item    Budget
A       900
B       350
C       100
D       0

bDT = structure(list(Item = c("A", "B", "C", "D"), Budget = c(900L, 
350L, 100L, 0L)), .Names = c("Item", "Budget"), row.names = c(NA, 
-4L), class = "data.frame")

and a table of expected expenses by item per date.

 Item       Date Expense
    A 2017-08-24     850
    B 2017-08-18     300
    B 2017-08-11      50
    C 2017-08-18      50
    C 2017-08-11     100
    D 2017-08-01     500

expDF = structure(list(Item = c("A", "B", "B", "C", "C", "D"), Date = structure(c(17402, 
17396, 17389, 17396, 17389, 17379), class = "Date"), Expense = c(850L, 
300L, 50L, 50L, 100L, 500L)), .Names = c("Item", "Date", "Expense"
), row.names = c(NA, -6L), class = "data.frame")

I'm looking to summarize the amount we can spend per item per date like this:

Item    Date        Spend
A       8/24/2017   850
B       8/18/2017   300
B       8/11/2017   50
C       8/18/2017   50
C       8/11/2017   50
D       8/1/2017    0
Frank
  • 66,179
  • 8
  • 96
  • 180
J.D. Marlin
  • 253
  • 1
  • 3
  • 15
  • Fyi, it's better/easier for everyone if you use `Date` class columns and provide code that easily reproduces the example. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 for guidance. – Frank Jul 31 '17 at 18:50
  • 2
    I don't see what this has to do with cumulative summing. But I don't understand the logic of how the result is produced. Why is `D` 0? – Gregor Thomas Jul 31 '17 at 18:50
  • @Gregor It's like `Budget - cumsum(data.table::shift(Expense, type = "lead"))`, I guess, showing the amount of budget prior to the current expense. Hm, no, that doesn't quite fit either, since both C's are 50, okay, I'm lost... maybe that, plus `pmax` or `pmin` somewhere. – Frank Jul 31 '17 at 18:52
  • @Gregor Spend is the total budget available per item per date (in descending order). So for D 'Spend' is 0 since we have no 'Budget' for it. – J.D. Marlin Jul 31 '17 at 19:01
  • Shouldn't `C 8/11/2017 50` equal to 0 then? – Majo Jul 31 '17 at 19:05
  • @Majo C has a total budget of 100. After `C 8/18/2017 50` (later date takes priority) 50 remains in the budget, therefore: `C 8/11/2017 50`. – J.D. Marlin Jul 31 '17 at 19:07

1 Answers1

1

This works:

library(data.table)
setDT(bDF); setDT(expDF)

expDF[bDF, on=.(Item), Spending :=
  pmin(
    Expense, 
    pmax(
      0, 
      Budget - cumsum(shift(Expense, fill=0))
    )
  )
, by=.EACHI]

   Item       Date Expense Spending
1:    A 2017-08-24     850      850
2:    B 2017-08-18     300      300
3:    B 2017-08-11      50       50
4:    C 2017-08-18      50       50
5:    C 2017-08-11     100       50
6:    D 2017-08-01     500        0

How it works

  • cumsum(shift(Expense, fill = 0)) is prior spending**
  • max(0, Budget - prior spending) is remaining budget
  • min(Expense, remaining budget) is current spending

The data.table syntax x[i, on=, j, by=.EACHI] is a join. In this case j takes the form v := expr, which adds a new column to x. See ?data.table for details.


** Well, "prior" in ordering of the table. I'll ignore the OP's weird reversed dates.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • Very nice! Also found these tutorials from DataCamp while installing data.table: [link](https://www.datacamp.com/courses/data-table-data-manipulation-r-tutorial) – J.D. Marlin Jul 31 '17 at 19:24
  • @J.D.Marlin Yup, that course by the data.table authors is quite helpful for getting up to speed on the package quickly. – Frank Jul 31 '17 at 19:27