0

I have a data.table in R that looks like the following:

> head(t)
   ITEM COUNTRY YEAR MONTH QUANTITY
1:    1       1 2014     1        2
2:    1       2 2014     1      133
3:    2       3 2014     1        2
4:    3       4 2014     1       20
5:    3       2 2014     1      142
6:    4       4 2014     1        5

There is for the same combination of ITEM-COUNTRY also data for other years/months; I have data from Jan 2014-Dec 2015 so 24 months. It might be that not all items/countries have data for all points, i.e. some have only for 19 months etc.

I know how to aggregate the quantity over the entire period:

agg = t[,list("TOT Q"=sum(QUANTITY)),by='ITEM,COUNTRY']

However, now I want something like this:

ITEM - COUNTRY - Q 1-2014 - Q 2-2014 - .... - Q 12-2015

In case there is no data for a specific item/country, the Q should be zero, and otherwise it should be the data in the corresponding row to the year and month. I.e. I want to aggregate on item and country, however, keep all the quantities and put them in different columns.

Edit:

Tried the comment of Jaap, which dit not work in the first place. But I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found stackoverflow.com/questions/30463591/… which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum).

What I use now:

dcast(t, ITEM + COUNTRY ~ MONTH + YEAR, value.var="QUANTITY", fun.aggregate=sum)
pk_22
  • 288
  • 1
  • 2
  • 18
  • 1
    Something like `dcast(t, ITEM + COUNTRY ~ paste0('M',MONTH,'_',YEAR))[, lapply(.SD, sum, na.rm = TRUE), by = .(ITEM, COUNTRY)]`? – Jaap Sep 20 '17 at 11:18
  • I'm trying it, but so far does not work. Also, I have more variables in the dataframe, after Quantity. So it is not using the right variables now. Also, does this immediately put the right value in the right columns? Like match the year and month with the column? – pk_22 Sep 20 '17 at 11:25
  • I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found https://stackoverflow.com/questions/30463591/r-reshape2-aggregation-function-missing-defaulting-to-length which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum). – pk_22 Sep 20 '17 at 11:42
  • If you have more variable you should specify `QUANTITY` with the `value.var`-parameter, e.g.: `dcast(t, ITEM + COUNTRY ~ paste0('M',MONTH,'_',YEAR), value.var = 'QUANTITY')[, lapply(.SD, sum, na.rm = TRUE), by = .(ITEM, COUNTRY)]` – Jaap Sep 20 '17 at 11:54
  • Yes I found it out. however, what is your last part doing? I got it working now without the last part lapply . – pk_22 Sep 20 '17 at 11:58
  • nvm, the last part isn't really needed; you can just use `fun.aggregate = sum` in the `dcast`-call – Jaap Sep 20 '17 at 12:31
  • Ah okay so that does the same trick ? – pk_22 Sep 20 '17 at 13:09
  • You should post your edit as an answer, I think, not as part of the question. See https://stackoverflow.com/help/self-answer – Frank Sep 20 '17 at 15:26
  • @Frank Thank you. Had no idea I could do that – pk_22 Sep 21 '17 at 07:14

1 Answers1

1

Tried the comment of Jaap, which dit not work in the first place. But I solved it. I was getting 0 and 1 as output and a warning of missing an aggregate function. Searching for this warning I found stackoverflow.com/questions/30463591/… which solved my problem. As I said, I have more variables, so for some item/country/month/year combinations I had duplicates, as they differed on other variables, so i needed an aggregate function (sum).

What I use now:

dcast(t, ITEM + COUNTRY ~ MONTH + YEAR, value.var="QUANTITY", fun.aggregate=sum)
pk_22
  • 288
  • 1
  • 2
  • 18