34

There are two functions in the R core library.

  • row.names Get and Set Row Names for Data Frames
  • rownames Retrieve or set the row names of a matrix-like object.

However the docs for row.names specifies For a data frame, ‘rownames’ and ‘colnames’ eventually call ‘row.names’ and ‘names’ respectively, but the latter are preferred. Why are is row.names preferred? Wouldn't it be easier to just ignore row.names and just call rownames?

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • 4
    That link doesn't help at all. – Rich Scriven Jul 19 '16 at 19:06
  • 3
    @RichardScriven If this question gets a good answer, perhaps that other question should be closed as a duplicate of this one. – Matthew Lundberg Jul 19 '16 at 19:07
  • One piece of the puzzle, I think is in the word "eventually." Since `rownames` eventually calls `row.names` for a data.frame, then it would be more efficient to cut out the middle man and take it to the source. I think another piece that this documentation focuses to data.frames. – lmo Jul 19 '16 at 19:17
  • 1
    Note that a "data.frame" has an explicit "row.names" attribute and not a "rownames". Also, `row.names` is a generic function that gets this specific attribute of the object and methods can be created for similar to "data.frame" objects – alexis_laz Jul 19 '16 at 19:24
  • Looks like cross-compatibility to me. `names(iris)` and `colnames(iris)` both work. I suspect the authors were kind enough to know that for old-school programmers coming from S or early R could still use old functionality, and new school users can use the new functions. So the language looks kind of Frankenstein after awhile, but it's a good thing to not have to remember which function goes with which data type. – Pierre L Jul 19 '16 at 20:50
  • I think [this discussion here is relevant](http://stackoverflow.com/a/8857411/4564247) to this question. It gets into the evolution of R and the seeming oddities that can result. – Pierre L Jul 20 '16 at 13:11
  • From the help file `?.row_names_info`, you can see that `row.names.default` which ultimately calls `row.names`, which can be specified to give the number of rows prespecified by the attribute, as well as automatically generate rows. This "compact form" is desirable. – shayaa Jul 24 '16 at 08:08

1 Answers1

31

row.names() is an S3 generic function whereas rownames() is a lower level non-generic function. rownames() is in effect the default method for row.names() that is applied to any object in the absence of a more specific method.

If you are operating on a data frame x, then it is more efficient to use row.names(x) because there is a specific row.names() method for data frames. The row.names() method for data frames simply extracts the "row.names" attribute that is already stored in x. By contrast, because of the definition of rownames() and the inter-relationships between the functions, rownames(x) has to extract all the dimension names of x, then drop the column names, then combine with names(x), then drop names(x) again. This process even involves a call to row.names(x) as an intermediate step. This will all usually happen so quickly that you don't notice it, but just extracting the attribute is obviously more efficient.

It would be logical to just use the generic version row.names() all the time, since it always dispatches the appropriate method. There is no practical advantage in using rownames(x) over row.names(x). For object classes that have a defined row.names method, then rownames(x) is wrong because it bypasses that method. For object classes with no defined row.names method, then the two functions are equivalent because row.names(x) simply calls rownames(x).

The reason why both functions exist is historical. rownames() is the older function and was part of the R language before generic functions and methods were introduced. It was intended only for use on matrices, but it will work fine on any data object that has a dimnames attribute. I personally use rownames(x) when x is a matrix and row.names(x) otherwise but, as I have said, one could just as well use row.names(x) all the time.

Gordon Smyth
  • 663
  • 5
  • 14