0

I recently downloaded this version of R:

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit)library(data.table)

The behavior for setkey() and unique() on data tables has changed. Previous versions of R (3.3.3) would not return 7 rows in the following example since there are two rows with V2=="D". This seems like a big change - is this intended?

library(data.table)
dt <- data.table(
+   V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
+   V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)]
+ )
setkey(dt, "V2")
unique(dt)

   V1 V2
1:  B  A
2:  A  B
3:  A  C
4:  A  D
5:  C  D
6:  E  F
7:  G  G
str(dt)
Classes ‘data.table’ and 'data.frame':  10 obs. of  2 variables:
$ V1: chr  "B" "A" "A" "A" ...
$ V2: chr  "A" "B" "B" "B" ...
- attr(*, ".internal.selfref")=<externalptr> 
- attr(*, "sorted")= chr "V2"
Marc
  • 11
  • 3
  • 1
    What version of the `data.table` library are you using? (see `sessionInfo()` to verify). Check out this potentially breaking change from version 1.9.8+: https://github.com/Rdatatable/data.table/blob/master/NEWS.md#potentially-breaking-changes. Perhaps you were using an older version of the library before? `unique()` now looks at all columns, not just key. – MrFlick Jun 19 '17 at 20:26
  • Don't use `+` and `>` if you want anyone to be able to test your code. – Frank Jun 19 '17 at 20:28
  • I'm using data.table_1.10.4 now same version as before. In the past unique() would look at all columns if the data table had no sort key otherwise it would just look at the key column and return the unique rows for the key column. – Marc Jun 19 '17 at 21:57
  • Here's another stackoverflow question with the same example from 4 years ago with a different result: https://stackoverflow.com/questions/11792527/filtering-out-duplicated-non-unique-rows-in-data-table – Marc Jun 19 '17 at 22:02
  • OK - I see that a change was made. That's a pretty big change! – Marc Jun 19 '17 at 22:04

0 Answers0