This (very basic) question is the result of an exchange here.
The documentation for setkey()
states:
setkey() sorts a data.table and marks it as sorted. The sorted columns are the key. The key can be any columns in any order. The columns are sorted in ascending order always. The table is changed by reference... (emphasis added)
I have always interpreted this to mean that setkey()
creates an index, rather than physically rearranging the rows of the data table (similar to indexing a database table). But if this was true then removing the key (using setkey(DT,NULL)
), should remove the index and restore the data table to it's original, unsorted order. This is not what happens:
library(data.table)
DT <- data.table(a=3:1, b=1:3, c=5:7); DT
a b c
1: 3 1 5
2: 2 2 6
3: 1 3 7
setkey(DT,a); DT
a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
setkey(DT,NULL)
a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
So two questions:
1: If the rows are rearranged (sorted), then what does "changed by reference" mean?
2: What does setkey(DT,NULL)
do exactly?