3

I have a data set with 5 columns:

store_id    year    event    item    units
123         2015     sale_2   abc      2
234         2015     sale_3   def      1
345         2015     sale_2   xyz      5

I'm trying to rotate out the items by store_id, year, and event to get the sum. For instance

store_id    year    event    abc     def   xyz 
123          2015    sale_2   2       0     0
234          2015    sale_3   0       1     0
345          2015    sale_2   0       0     5    

I'm having trouble figuring out the best method. Normally I'd use dummyVars in caret to do this but I need sums instead of flag. I've looked at tapply but it can't handle more than 2 grouping variables.

Any other suggestions?

rwat128
  • 3
  • 2
screechOwl
  • 27,310
  • 61
  • 158
  • 267

2 Answers2

6
library(reshape2)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units')
#    store_id year  event abc def xyz
# 1:      123 2015 sale_2   2   0   0
# 2:      234 2015 sale_3   0   1   0
# 3:      345 2015 sale_2   0   0   5

For large datasets consider

# uses dcast.data.table, much faster
library(data.table)
setDT(df)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units') 
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
3
library(dplyr)
library(tidyr)
data %>%
group_by(store_id, year, event, item) %>%
summarize(N = sum(units)) %>%
spread(item, N)

You can use dplyr to group and summarize and tidyr to spread the data into the desired item columns.

troh
  • 1,354
  • 10
  • 19