5

I am using data.table and dplyr together. I recently noticed dplyr::group_by can also set key to a data.table object.

# R version 3.1.0    
library(data.table) # 1.9.2
library(dplyr) # 0.1.3

dt <- data.table(A=rep(c("a", "b"), times=c(2, 3)), B = rep(1, 5))
tables()
#      NAME NROW MB COLS KEY
# [1,] dt      5  1 A,B
# Total: 1MB

group_by(dt, A)
tables()
#      NAME NROW MB COLS KEY
# [1,] dt      5  1 A,B  A
# Total: 1MB

I am wondering why this happens. Is this intended? as I know Hadley is trying to make dplyr compatible with data.table.

(If possible, I would also like to know how key is implemented in data.table. Very curious about why setkey can change it inplace?)

Thanks


Per G. Grothendieck's request:

library(data.table)
dt <- data.table(A = rep(c("a", "b"), times=c(2, 3)),
                 B = rep(1, 5))
dplyr::group_by(dt, A)
# Source: local data table [5 x 2]
# Groups: A
#
# Error in if (is.na(rows) || rows > getOption("dplyr.print_max")) { :
#   missing value where TRUE/FALSE needed

tables()
#      NAME NROW MB COLS KEY
# [1,] dt      5  1 A,B  A
# Total: 1MB

I use these two packages quite often, I would like to know all details so to avoid mistakes.

yuez
  • 885
  • 1
  • 8
  • 16
  • 1
    Please make your example self contained and reproducible showing exactly which `library` statements you used and which versions of the packages you are using. If not most recent versions please try it again with the most recent as well. If dplyr was not loaded then also try it with dplyr loaded removing the dplyr:: . What do you get? – G. Grothendieck Apr 26 '14 at 13:50
  • It would seem to be a reasonable event to expect. `group_by` would just be executing `setkey` which creates a hashkey in the object. data.table syntax is somewhat "different" than R syntax, so it does not seem at all unreasonable that there be a side-effect of this sort. – IRTFM Apr 26 '14 at 15:13
  • Thanks @G.Grothendieck, I have put on more details. – yuez Apr 26 '14 at 21:31
  • I suggest you try this with the most recent version of dplyr from CRAN and then try it again with the most recent version of dplyr from github. – G. Grothendieck Apr 26 '14 at 21:40
  • 1
    I'm on commit 1482 from github page for `dplyr`. And that doesn't set the key. [Here's the relevant post](http://stackoverflow.com/a/22517701/559784). Follow the comments as well. I'm guessing Hadley's made this change recently, as per the discussions in that post. – Arun Apr 26 '14 at 21:48

0 Answers0