3

I have a data which looks like this :-

data_source   zip        date        calories      user            price
 compA        45768      18274        3500          abc             912.27
 compB        33098      18274        3500          groups          981.28
 compA        39104      18274        2500          ands            659.75

I would like to have wide format of data using dcast; Earlier it use to work, but now it does not.

data.table::dcast(zip + date + calories ~ data_source, value.var=c("user","price"), data=data)

As you can see the column in value.var has character and numeric value both, and so I'm confused what to use in fun.aggregate. So the data converted is defaulting to length which is what I do not want. I just want the values as it is but in wide format. Thanks for your help.

relu
  • 333
  • 1
  • 3
  • 18
  • 4
    As akrun noted, you will see length used only if there are duplicates by your x ~ y vars. Try `DT[, if (.N > 1) .SD, by=.(zip, date, calories, data_source)]` to see the duplicated rows. – Frank 2 Jan 13 '20 at 18:30
  • 1
    @frank-2: This was a great help. Thanks . Please keep up the good work. – relu Jan 13 '20 at 23:18

1 Answers1

3

We can specify length in fun.aggregate if the length is needed

library(data.table)
dcast(setDT(data), zip + date + calories ~ data_source, 
       value.var=c("user","price"), length)

Based on the data showed, there are no duplicates, so it would work

dcast(setDT(data), zip + date + calories ~ data_source, value.var=c("user","price"))

If there are duplicates, make a correction to have unique combinations by adding rowid for the grouping variable

dcast(setDT(data), rowid(zip, date, calories) + zip + date + calories 
          ~ data_source, value.var=c("user","price"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • As I mentioned in the question, this should work :- `dcast(setDT(data), zip + date + calories ~ data_source, value.var=c("user","price"))` but it doesn't. And I get length which is not what I want . Using `rowid` throws error `Error in zip + date : non-numeric argument to binary operator` – relu Jan 13 '20 at 18:52
  • @newbie Consider `dcast(setDT(data), rowid(zip, date, calories) + zip + date + calories ~ data_source, value.var=c("user","price"))` this. Earlier, I had a typo `rowid(zip + data + calories)` instead of `rowid(zip, data, calories)` – akrun Jan 13 '20 at 18:54
  • 1
    Thanks mate for your help. It works. – relu Jan 13 '20 at 23:18