3

here is a data.table:

Date     colA  colB  colC  .... month    year
01/23/15  2323  2323 2323        january  2015
.......

On this data.table Im trying to: 1) Sum all column values by month and then year 2) In the subset returned I want to exclude the Date column

I have set keys on the DT as follows:

setkey(DT, month, year)

Now Im running this command to achieve the operations listed in steps 1 & 2 above:

DT[ ,lapply(.SD, sum, na.rm=TRUE), by=.(month , year), .SDcols= 2:(length(colnames(DT))-2) ]

I got the above example from this SO post here.

When I run this..... I get the following error:

Error in gsum(`colA`, na.rm = TRUE) : 
  Type 'character' not supported by GForce sum (gsum). Either add the prefix base::sum(.) or turn off GForce optimization using options(datatable.optimize=1)

Im not sure what this means and how to debug it.......

Any assistance would be appreciated. Thanks

Community
  • 1
  • 1
am1234
  • 59
  • 2
  • 6

1 Answers1

6

The error says that you cannot sum a character, so I'd say that colA is a character. You can use str(DT) to see the types of the variables in your data.

I created a similar dataset and used the code you provided and it worked for me:

library(data.table)
DT = data.table("Date" = c('01/23/15', '01/24/15', '02/23/15', '02/24/15'),
        "colA" = c(2323, 1212, 1234, 2345),
        "colB" = c(2323, 1112, 1134, 2245),
        "colC" = c(2323, 1012, 1434, 2445),
        "month" = c('january', 'january', 'february', 'february'),
        "year" = c(2015, 2015, 2015, 2015)
)

setkey(DT, month, year)

DT[ ,lapply(.SD, sum, na.rm=TRUE), by=.(month , year), .SDcols= 2:(length(colnames(DT))-2) ]
      month year colA colB colC
1: february 2015 3579 3379 3879
2:  january 2015 3535 3435 3335
Tchotchke
  • 3,061
  • 3
  • 22
  • 37