0

I'm having troubles to aggregate differents shops by date and SKU.

My sample data is like this:

Date        SKU    Quantity Price 
2010/01/01  52144    1        4.00
2010/01/01  87548    5        0.50
2010/01/01  47852    3        1.99
2010/01/01  52144    4        3.80
2010/01/02  87548    1        0.55

And I'm trying to use the function agreggate to sty like this:

 Date        SKU    Quantity Price 
2010/01/01  52144    5        3.80
2010/01/01  87548    5        0.50
2010/01/01  47852    3        1.99
2010/01/02  87548    1        0.55

Thanks!

Zé Pinho
  • 83
  • 1
  • 1
  • 6

3 Answers3

8

You can do this easily with dplyr

library(dplyr)
group_by(df, Date, SKU) %>% 
    summarize(Quantity = sum(Quantity), Price = min(Price))
#         Date   SKU Quantity Price
# 1 2010/01/01 47852        3  1.99
# 2 2010/01/01 52144        5  3.80
# 3 2010/01/01 87548        5  0.50
# 4 2010/01/02 87548        1  0.55

And also with data.table

library(data.table)
setDT(df)[, .(Quantity = sum(Quantity), Price = min(Price)), by = .(Date, SKU)]
#          Date   SKU Quantity Price
# 1: 2010/01/01 52144        5  3.80
# 2: 2010/01/01 87548        5  0.50
# 3: 2010/01/01 47852        3  1.99
# 4: 2010/01/02 87548        1  0.55
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Is there a way to do this "simmultaneously" using `aggregate`, `dplyr` or some other function/package? (e.g. the sum of one column and the minimum of another) – Barranka Nov 18 '14 at 22:12
  • 2
    Not sure about aggregate, but `dplyr` and `data.table` are much easier – Rich Scriven Nov 18 '14 at 22:26
4

When dealing with this kind of things, I've found that the sqldf package is a great tool. Let's say your data is stored in a data frame called df. You can use a SELECT instruction to get what you need:

sqldf("select Date, SKU, sum(a.Quantity) as Quantity, min(a.Price) as price
       from df as a
       group by Date, SKU")

You can use standard SQL instructions to manipulate, filter or aggregate data stored in data frames (which sqldf reads as if they were tables)

Barranka
  • 20,547
  • 13
  • 65
  • 83
4

cough base cough

dat <- read.table(header = TRUE, text = "Date        SKU    Quantity Price 
2010/01/01  52144    1        4.00
2010/01/01  87548    5        0.50
2010/01/01  47852    3        1.99
2010/01/01  52144    4        3.80
2010/01/02  87548    1        0.55")

tmp <- within(dat, {
  sums <- ave(Quantity, list(Date, SKU), FUN = sum)
  mins <- ave(Price, list(Date, SKU), FUN = min)
})

tmp[!with(tmp, duplicated(cbind(Date, SKU))), ]

#         Date   SKU Quantity Price mins sums
# 1 2010/01/01 52144        1  4.00 3.80    5
# 2 2010/01/01 87548        5  0.50 0.50    5
# 3 2010/01/01 47852        3  1.99 1.99    3
# 5 2010/01/02 87548        1  0.55 0.55    1
rawr
  • 20,481
  • 4
  • 44
  • 78