18

This (very basic) question is the result of an exchange here.

The documentation for setkey() states:

setkey() sorts a data.table and marks it as sorted. The sorted columns are the key. The key can be any columns in any order. The columns are sorted in ascending order always. The table is changed by reference... (emphasis added)

I have always interpreted this to mean that setkey() creates an index, rather than physically rearranging the rows of the data table (similar to indexing a database table). But if this was true then removing the key (using setkey(DT,NULL)), should remove the index and restore the data table to it's original, unsorted order. This is not what happens:

library(data.table)
DT <- data.table(a=3:1, b=1:3, c=5:7); DT
   a b c
1: 3 1 5
2: 2 2 6
3: 1 3 7
setkey(DT,a); DT
   a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
setkey(DT,NULL)
   a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5

So two questions:

1: If the rows are rearranged (sorted), then what does "changed by reference" mean?

2: What does setkey(DT,NULL) do exactly?

Community
  • 1
  • 1
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • 1
    I don't know the answer, but keep in mind that just because the table is _displayed_ as sorted does not mean that it was sorted when you set the key. Typing `DT` at the console is essentially the same as calling a print function, and that function might be doing the sorting. – joran Nov 19 '13 at 16:21

1 Answers1

12
  1. The rows are sorted. "Changed by reference" here means there is no copying of the entire table and rows are just swapped.

  2. setkey(DT, NULL) is equivalent to setattr(DT, "sorted", NULL). It simply unsets the "sorted" attribute.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • 2
    In addition, note that `dput(DT)` has actually changed in addition to getting a `sorted` attribute in the second example. – Señor O Nov 19 '13 at 16:57