11

Given a data frame, DF, it is simple to save DF as an R object using save() and share with co-workers. However, it is often necessary to attach a separate document explaining precise column definitions. Is there a (standard/common) way to include this information with the object?

If we built a package for DF we could create a help page explaining all these details, like the built-in datasets. Thus the data and explanation would always be available and we need only share a single package source file. However, building a package seems over-kill for this problem. (As a side benefit, we would gain version control on the data set as changes would increment the package version number).

The Hmisc package includes the label() function, which adds a new attribute to objects. Associated methods for subsetting/creating/etc data.frames are included to propagate the new attribute (since attributes are in general dropped by most functions).

Setting attributes is an obvious alternative to writing a package, and we can add arbitrarily named attributes.

A brief example:

DF <-
structure(list(Gender = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Female",
"Male"), class = "factor"), Date = structure(c(15518, 15524,
15518, 15526, 15517, 15524), class = "Date"), Dose = c(15, 10,
11, 11, 12, 14), Reaction = c(7.97755180189919, 11.7033586194156,
9.959784869289, 6.0170950790238, 1.92480908119655, 7.70265419443507
)), .Names = c("Gender", "Date", "Dose", "Reaction"), row.names = c(NA,
-6L), class = "data.frame")

library(Hmisc)

label(DF$Reaction) <- "Time to react to eye-dot test, in seconds, recorded electronically"

# or we could set our own attributes

attr(DF$Date,"Description") <- "Date of experiment. Note, results are collected weekly from test centres"

# Since Hmisc adds class "labelled" to data.frame and impelments
# the appropriate methods, the formed is retained on subsetting 
# (not that this is feature is wanted)

DF.mini <- DF[ DF$Gender=="Male",]


# compare
str(DF)      # Not quite sure why str() prints the label attribute but not the Desciptions
str(DF.mini) # we retain the label attribute

attributes(DF$Date)
attributes(DF.mini$Date) # we lose the Description attribute

So my questions:

  1. Do people include extra information with their objects (my example is a data frame, but applies to all R objects), keeping all the relevant information in one place?
  2. If yes, how?
  3. Curious, why does str() print the label attribute, I believe the Hmisc package has added another function/method somewhere, but couldn't see why - can someone explain that bit?
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Simon
  • 741
  • 1
  • 6
  • 11
  • 1
    Packaging data together with associated .Rd files is a clean solution, and closest to a 'standard' approach. Alternatively, use `Hmisc::label()`, `comment()`, or `attr()` as mentioned in responses to [this very similar question](http://stackoverflow.com/questions/7919618/how-to-add-documentation-to-a-data-frame-in-r/7919721#7919721). So you basically answered your own questions, except for the bit about `str()` ... – Josh O'Brien Jul 05 '12 at 16:23
  • ... which probably has to do with `Hmisc`' `print.labelled()` method. – Josh O'Brien Jul 05 '12 at 16:31
  • 2
    Related question that had some really good answers: http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets – Ari B. Friedman Jul 05 '12 at 16:41
  • This was related to documenting a function, but the same concepts apply: http://stackoverflow.com/questions/6324568/function-commenting-conventions-in-r/6325030#6325030 – Chase Jul 05 '12 at 17:46

1 Answers1

6

There is a base function: comment which can assign or retrieve text which is stored in an attribute.

(And I do not understand the question about why does str print the label. Shouldn't all (non-name, non-class, non-rowname) attributes be displayed by str?)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you for your answer, I had forgotten the comment() function. I will give the roxygen approach a try, but comment() is also handy. When I use the str() function, with Hmisc loaded and an object with a label attribute, it displays the label attribute - whereas my arbitrary Description attribute is not printed. I assume Hmisc has added a method or altered something, but can't see what. (Should I ask this as a new question - is going slightly off topic) – Simon Jul 09 '12 at 11:38
  • Yes. I think the Hmisc version might gather attributes, whereas `comment` appears to be just for text. You can see the code with: `require(Hmisc); label.data.frame; label.default` – IRTFM Jul 09 '12 at 11:57