Set display order of data.table `(group, -value.1)` while preserving key `id`

Question

Is it possible to store the order of rows in a data.table while preserving its keys?

Lets say I have the following dummy table:

library(data.table)
dt <- data.table(id=letters[1:6], 
                   group=sample(c("red", "blue"), replace=TRUE), 
                   value.1=rnorm(6), 
                   value.2=runif(6))
setkey(dt, id)
dt
   id group    value.1    value.2
1:  a  blue  1.4557851 0.73249612
2:  b   red -0.6443284 0.49924102
3:  c  blue -1.5531374 0.72977197
4:  d   red -1.5977095 0.08033604
5:  e  blue  1.8050975 0.43553048
6:  f   red -0.4816474 0.23658045

I would like to store this table so that rows are ordered by group, and by value.1 in decreasing order, i.e:

> dt[order(group, value.1, decreasing=T),]
   id group    value.1    value.2
1:  f   red -0.4816474 0.23658045
2:  b   red -0.6443284 0.49924102
3:  d   red -1.5977095 0.08033604
4:  e  blue  1.8050975 0.43553048
5:  a  blue  1.4557851 0.73249612
6:  c  blue -1.5531374 0.72977197

Obviously I can save this as a new variable, but I also want to keep the id column as my primary key.

Arun's answer to "What is the purpose of setting a key in data.table?" suggests that this can be achieved with clever use setkey, since it orders the data.table in the order of its keys (although there is no option to set the key to decreasing order):

> setkey(dt, group, value.1, id)
> dt
   id group    value.1    value.2
1:  c  blue -1.5531374 0.72977197
2:  a  blue  1.4557851 0.73249612
3:  e  blue  1.8050975 0.43553048
4:  d   red -1.5977095 0.08033604
5:  b   red -0.6443284 0.49924102
6:  f   red -0.4816474 0.23658045

However, I lose the ability to use id as my primary key, because group is the first key provided:

> dt["a"]
   group id value.1 value.2
1:     a NA      NA      NA

Not an exact duplicate of http://stackoverflow.com/questions/15597685/subsetting-data-table-by-2nd-column-only-of-a-2-column-key-using-binary-search/15597713, however I think this will only be possible when [FR#1007](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1007&group_id=240&atid=978) is implemented (secondary keys) — mnel, May 16 '14 at 01:41
Maybe I'm not understanding it right, but this example doesn't really need sorting in decreasing order, isn't it? All values in `id` are unique... — Arun, May 16 '14 at 01:50
@mnel I did try to follow that chain of questions, but it wasn't clear to me whether secondary keys would have the same issue. — Scott Ritchie, May 16 '14 at 01:54
@Arun I want to sort by `value.1` within each group, but want to keep `id` as the primary key. Does that make sense? — Scott Ritchie, May 16 '14 at 01:58
@ScottRitchie, yes. Thanks. Could you elaborate a bit on the type of problem for which you're requiring this (subset based on different keys)? — Arun, May 16 '14 at 02:13
It's mostly aesthetic. I'm looking at some group of genes, and I'm ranking them within each group by network connectivity, so it's useful for me to be able to quickly see the `data.table` in that order when talking to my supervisor and collaborators. — Scott Ritchie, May 16 '14 at 02:30
Essentially, I want the `data.table` to behave exactly like `dt; setkey(dt, id)`, but for `show`/`print` to behave as if I'd called `dt[order(group, value.1, decreasing=TRUE)]`. My solution at the moment is to save two tables: one which is keyed, and one which is stored in the order I want. — Scott Ritchie, May 16 '14 at 02:32
It is not obvious from your question should the `id` be the primary key. If not, you can still search the table by `id` even `id` is not a key: `dt[id == "a"]` — djhurio, May 20 '14 at 12:02
A workaround to get decreasing order is to negate the key field: `dt[, Col := -Col]` — smci, Apr 14 '15 at 15:03
@mnel: FR#1007 on R-forge is no longer accessible to us, but since dt project has moved to github I assume it was implemented, can you confirm? (and provide updated link) — smci, Apr 23 '18 at 21:36

score 3 · Answer 1 · answered May 16 '14 at 15:26

Sounds like you simply want to modify print.data.table:

print.data.table = function(x, ...) {
  # put whatever condition identifies your tables here
  if ("group" %in% names(x) && "value.1" %in% names(x)) {
    data.table:::print.data.table(x[order(group, value.1, decreasing = T)], ...)
  } else {
    data.table:::print.data.table(x, ...)
  }
}

set.seed(2)
dt = data.table(id=letters[1:6], 
               group=sample(c("red", "blue"), replace=TRUE), 
               value.1=rnorm(6), 
               value.2=runif(6))
setkey(dt, id)
dt
#   id group     value.1    value.2
#1:  a   red  0.18484918 0.40528218
#2:  e   red  0.13242028 0.44480923
#3:  c   red -1.13037567 0.97639849
#4:  b  blue  1.58784533 0.85354845
#5:  f  blue  0.70795473 0.07497942
#6:  d  blue -0.08025176 0.22582546

dt["c"]
#   id group   value.1   value.2
#1:  c   red -1.130376 0.9763985

I was hoping there would be an inbuilt solution, but good idea to modify `print.data.table`. — Scott Ritchie, May 21 '14 at 01:47

score 0 · Answer 2 · answered May 20 '14 at 13:23

0

I think you can still search by id only, as follows:

dt[J(unique(group),unique(value.1),"a"), nomatch=0]
   group   value.1 id   value.2
1:  blue 0.4928595  a 0.3311728

from what I gathered unique(column_name) is the way to include all values for that column.

I am not sure if this helps.

answered May 20 '14 at 13:23

Timothée HENRY

14,294
21
96
136

I get an empty table when I try that. – Scott Ritchie May 21 '14 at 01:43

score 0 · Accepted Answer · answered May 21 '14 at 02:33

Building on @eddi's answer, I've created a hackish solution where I store an unevaluated call to order as an attribute of the data.table, which print.data.table obeys:

set_order <- function(dt, cols, decreasing=FALSE) {
  # Store a call to order as an additional attribute
  attr(dt, "order") <- paste0("order(", paste(cols, collapse=", "), 
                              ", decreasing=", decreasing, ")")
  invisible(dt)
}

print.data.table = function(x, ...) {
  if (!is.null(attr(x, "order"))) {
    # Use the stored ordering to print the data.table
    data.table:::print.data.table(x[eval(parse(text=attr(x, "order")))], ...)
  } else {
    data.table:::print.data.table(x, ...)
  }
}

Giving me the behaviour I want:

dt <- set_order(dt, c("group", "value.1"), decreasing=T)
dt
#    id group    value.1    value.2
# 1:  f   red -0.4816474 0.23658045
# 2:  b   red -0.6443284 0.49924102
# 3:  d   red -1.5977095 0.08033604
# 4:  e  blue  1.8050975 0.43553048
# 5:  a  blue  1.4557851 0.73249612
# 6:  c  blue -1.5531374 0.72977197

tables()
#      NAME NROW MB COLS                     KEY
# [1,] dt      6 1  id,group,value.1,value.2 id 
# Total: 1MB

I noticed data.tables have a `"sorted"` attributed, however changing this doesn't affect the order the table is printed in. — Scott Ritchie, May 21 '14 at 02:54

Set display order of data.table `(group, -value.1)` while preserving key `id`

3 Answers3