17

Reference: While trying to answer this very basic question, I suddenly realized that I wasn't able to display rownames in a data.table object

Toy example

library(data.table)
DT <- data.table(A = letters[1:3])
DT
##    A
## 1: a
## 2: b
## 3: c
row.names(DT) <- 4:6
row.names(DT)
## [1] "4" "5" "6" # seem to work

or

rownames(DT) <- 7:9
rownames(DT)
## [1] "7" "8" "9" # seems to be ok too

But when displaying the data itself, row names remains unchanged

DT
##    A
## 1: a
## 2: b
## 3: c

I would assume data.table ignores unnecessary attributes for efficiency purposes, but attributes seem to disagree

attributes(DT)
# $names
# [1] "A"
# 
# $row.names
# [1] 7 8 9
# 
# $class
# [1] "data.table" "data.frame"
# 
# $.internal.selfref
# <pointer: 0x0000000000200788>
Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 2
    It should be pretty easy to modify `data.table:::print.data.table` to print the actual row names. I'd have to do some benchmarking, but I don't think you'd lose to much performance. It's a design decision. Personally, I don't see much use for row names in data.tables. You'd never display all of them if the DT is reasonably large and for every other use it should be preferable to have this information in a column. – Roland Jun 13 '14 at 07:17
  • 1
    In fact, I suspect data.tables only have a row.names attribute for their data.frame legacy. – Roland Jun 13 '14 at 07:29
  • @Roland, I gave a specific example when I needed the row names, so it wasn't some arbitrary discussion, but I see where you going. I wonder why Josh erased his answer... – David Arenburg Jun 13 '14 at 09:24
  • data.tables don't have row names, on purpose. Having row names is a bad design choice - simply store the data as an extra column. – eddi Jun 13 '14 at 14:46
  • 2
    @eddi: Bad design choice. Hmm. Care to expound? – asb Jun 13 '14 at 15:33
  • @asb sure - in terms of what data row names represent they are a subset of what an extra column can represent (as row names for a `data.frame` are only a character vector, while columns can be anything), but in terms of their usability row names are far more restrictive and much harder to use than an extra column (and this last point is amplified by how easy it is to use columns in `data.table`, but is also there in `data.frame`) – eddi Jun 13 '14 at 15:38
  • @eddi: Thanks. What I have been thinking of is `pandas` which goes as far as to allow multiple indexing. Do you see the two as different designs? – asb Jun 13 '14 at 15:51
  • @asb not sure what `pandas` multiple indexing does - is it a key with multiple columns (which exists in `data.table`), or multiple keys per data (which doesn't exist in `data.table`)? Either way I don't really see what row names have to do with that. – eddi Jun 13 '14 at 16:00
  • In my experience I tend to use the `rownames` of `data.frame`s as an method for row indexing. `data.table`'s key system removes the need to use `rownames`to do this. – Scott Ritchie Jun 14 '14 at 07:25
  • @eddi, I think it will be good if you'll post a detailed answer here for the benefit of further readers and so this question will get out of the "unanswered" queue – David Arenburg Jun 14 '14 at 18:24
  • 3
    If you are converting a data.frame `DF` with row names, you can use `DT <- data.table(DF, keep.rownames = TRUE)` which will create a column with the row names, named `rn`. Still, the data.table `DT` will not have row names. https://cran.r-project.org/web/packages/data.table/data.table.pdf – Konstantinos Jan 27 '16 at 22:43

1 Answers1

29

This is more or less verbatim from comments.

data.table doesn't support row names. This is intentional, as row names are a bad design choice, because they are far more cumbersome to use than columns (and especially so in data.table, where columns are so much easier to deal with than in data.frame) and are only a subset of what kind of data columns can represent (recall that row names in data.frame are a character vector only, whereas columns can be anything).

eddi
  • 49,088
  • 6
  • 104
  • 155
  • 3
    Looking at `pandas.Index`, I don't feel like excusing `data.table` for not implementing a good Index class. – shouldsee May 10 '19 at 22:29
  • 3
    The data.table team agrees with you. Which is why data.table has such powerful indexing operations, see their introduction/doc: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html. row.names might be usable for indexing, but they are not a GOOD index class. – glenn Nov 04 '20 at 16:46