1

I have a data set with sales numbers from individual items ordered on various dates. All items in a particular order share the same ID. I want to calculate order totals (i.e. sum of sales for all items in a particular order), while preserving the date associated with each order (we can assume all items part of an order share the same date). How do I sum sales with respect to ID, while preserving the date?

This questions is different from others I've seen, because I want to preserve and collapse the Date column while summing with respect to a different column, Sales.

Columns before: Date, ID, Sales

Columns after: Date, ID, Order.Total

The following code returns an error because dates obviously cannot be summed :

df[, lapply(.SD, sum), by = "ID"]

The following code removes the Date field altogether :

df[, lapply(.SD, sum), by = "ID", .SDcols = !"Date"]

For example, if my data set before is :

DATE  ID  SALES
1/2   01    1 
1/2   01    2
1/2   02    3
1/3   03    6
1/4   04    5
1/4   04    4

My data set after should be :

DATE  ID  ORDER.TOTAL
1/2   01    3 
1/2   02    3
1/3   03    6
1/4   04    9

1 Answers1

1

I assume you're after this?

dt[, .(ORDER.TOTAL = sum(SALES)), by =.(DATE, ID)]
#    DATE ID ORDER.TOTAL
#1:  1/2  1           3
#2:  1/2  2           3
#3:  1/3  3           6
#4:  1/4  4           9

which is the same as

dt[, list(ORDER.TOTAL = sum(SALES)), by = list(DATE, ID)]

Or the same in base R using aggregate

aggregate(SALES ~ DATE + ID, data = as.data.frame(dt), FUN = sum)
#  DATE ID SALES
#1  1/2  1     3
#2  1/2  2     3
#3  1/3  3     6
#4  1/4  4     9

Sample data

library(data.table)
dt <- fread("DATE  ID  SALES
1/2   01    1
1/2   01    2
1/2   02    3
1/3   03    6
1/4   04    5
1/4   04    4")
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68