0

I'm losing the row names of the data. It appears to happen when the rbind function is used on a data table.

Here's an example showing what should happen

library(data.table)
allData <- NULL
for (itest in seq(3)) {
  pts <- NULL
  npts <- 4
  for (ipt in seq(npts)) {
    pp <- c(ipt, ipt*2, ipt^3)
    pts <- rbind(pts,pp)
  }
  colnames(pts)<-c('A','B','C')
  rownames(pts) <- paste('test',itest,seq(npts),sep='_')
  # pts<-data.table(pts)
  print(pts)
  allData <- rbind(allData,pts)
}
print(allData)

The output is

         A B  C
test_1_1 1 2  1
test_1_2 2 4  8
test_1_3 3 6 27
test_1_4 4 8 64
test_2_1 1 2  1
test_2_2 2 4  8
test_2_3 3 6 27
test_2_4 4 8 64
test_3_1 1 2  1
test_3_2 2 4  8
test_3_3 3 6 27
test_3_4 4 8 64

When the data table is used, the row names are lost

library(data.table)
allData <- NULL
for (itest in seq(3)) {
  pts <- NULL
  npts <- 4
  for (ipt in seq(npts)) {
    pp <- c(ipt, ipt*2, ipt^3)
    pts <- rbind(pts,pp)
  }
  colnames(pts)<-c('A','B','C')
  pts<-data.table(pts)
  rownames(pts) <- paste('test',itest,seq(npts),sep='_')
  allData <- rbind(allData,pts)
}
print(allData)

Output with data table

    A B  C
 1: 1 2  1
 2: 2 4  8
 3: 3 6 27
 4: 4 8 64
 5: 1 2  1
 6: 2 4  8
 7: 3 6 27
 8: 4 8 64
 9: 1 2  1
10: 2 4  8
11: 3 6 27
12: 4 8 64

How should the code be modified to keep the row names?

M--
  • 25,431
  • 8
  • 61
  • 93
  • In response to the comment that was deleted. Using data.table(pts, keep.rownames=TRUE) does fix the question as originally posted in the sample code. But, that doesn't fix the problem in the actual code. I'll update the sample. –  Jan 02 '15 at 13:22
  • 1
    `data.table` object doesn't have `row.names` period. If you want to keep them, use the suggestion mentioned in comments. See [this question](http://stackoverflow.com/questions/24199533/display-row-names-in-a-data-table-object) – David Arenburg Jan 02 '15 at 13:29
  • I would create a new column called Test and store the row names in that variable. – talat Jan 02 '15 at 13:32
  • Also, is your example is real? There is truly no need in double loops and growing objects here. You can do this whole thing in about one/two vectorizes lines – David Arenburg Jan 02 '15 at 13:38
  • OK, that's informative. I'll move the "row name" information into columns to keep track for later use. Let's consider this question closed. I may ask a different question, related to labeling cluster points, since they are being labeled as integer values and I assumed the labels were coming from the row names, which appears to be lost in processing. –  Jan 02 '15 at 13:39
  • Thanks for the comment regarding vector lines / loops. The outer loop is similar to the one posted, but the inner loop is artificial. The inner data is generated via a number of functions using vector lines. The outer loop executes a small number of times (so far in development), so the data generation runs pretty fast (so far). Does using one vectorized line avoid growing objects? Or, better yet, is there a way to program to avoid growing objects. –  Jan 02 '15 at 13:45
  • OK, answer added.... –  Jan 02 '15 at 14:09

1 Answers1

1

Based on the information in the comments, and the ref below, data tables do not store row names, and therefore row names should not be used with data tables. If the row name contains needed identifying information, that information should be moved into additional column(s) in the data table.

Ref page 7 of Package ‘data.table’, revision December 22, 2014, quote

A data.table is a list of vectors, just like a data.frame. However :
1. it never has rownames. Instead it may have one key of one or more columns. 
This key can be used for row indexing instead of rownames.

where "it" refers to a data.table object, and the text was reformatted to fit in the horizontal space here.