Aggregate data.table on conditions but add columns

Question

I have a data.table in R that looks like the following:

> head(t)
   ITEM COUNTRY YEAR MONTH QUANTITY
1:    1       1 2014     1        2
2:    1       2 2014     1      133
3:    2       3 2014     1        2
4:    3       4 2014     1       20
5:    3       2 2014     1      142
6:    4       4 2014     1        5

There is for the same combination of ITEM-COUNTRY also data for other years/months; I have data from Jan 2014-Dec 2015 so 24 months. It might be that not all items/countries have data for all points, i.e. some have only for 19 months etc.

I know how to aggregate the quantity over the entire period:

agg = t[,list("TOT Q"=sum(QUANTITY)),by='ITEM,COUNTRY']

However, now I want something like this:

ITEM - COUNTRY - Q 1-2014 - Q 2-2014 - .... - Q 12-2015

In case there is no data for a specific item/country, the Q should be zero, and otherwise it should be the data in the corresponding row to the year and month. I.e. I want to aggregate on item and country, however, keep all the quantities and put them in different columns.

Edit:

Tried the comment of Jaap, which dit not work in the first place. But I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found stackoverflow.com/questions/30463591/… which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum).

What I use now:

dcast(t, ITEM + COUNTRY ~ MONTH + YEAR, value.var="QUANTITY", fun.aggregate=sum)

Something like `dcast(t, ITEM + COUNTRY ~ paste0('M',MONTH,'_',YEAR))[, lapply(.SD, sum, na.rm = TRUE), by = .(ITEM, COUNTRY)]`? — Jaap, Sep 20 '17 at 11:18
I'm trying it, but so far does not work. Also, I have more variables in the dataframe, after Quantity. So it is not using the right variables now. Also, does this immediately put the right value in the right columns? Like match the year and month with the column? — pk_22, Sep 20 '17 at 11:25
I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found https://stackoverflow.com/questions/30463591/r-reshape2-aggregation-function-missing-defaulting-to-length which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum). — pk_22, Sep 20 '17 at 11:42
If you have more variable you should specify `QUANTITY` with the `value.var`-parameter, e.g.: `dcast(t, ITEM + COUNTRY ~ paste0('M',MONTH,'_',YEAR), value.var = 'QUANTITY')[, lapply(.SD, sum, na.rm = TRUE), by = .(ITEM, COUNTRY)]` — Jaap, Sep 20 '17 at 11:54
Yes I found it out. however, what is your last part doing? I got it working now without the last part lapply . — pk_22, Sep 20 '17 at 11:58
nvm, the last part isn't really needed; you can just use `fun.aggregate = sum` in the `dcast`-call — Jaap, Sep 20 '17 at 12:31
You should post your edit as an answer, I think, not as part of the question. See https://stackoverflow.com/help/self-answer — Frank, Sep 20 '17 at 15:26

score 1 · Answer 1 · answered Sep 21 '17 at 07:13

Tried the comment of Jaap, which dit not work in the first place. But I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found stackoverflow.com/questions/30463591/… which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum).

What I use now:

dcast(t, ITEM + COUNTRY ~ MONTH + YEAR, value.var="QUANTITY", fun.aggregate=sum)

Aggregate data.table on conditions but add columns

1 Answers1