65

Is it possible to include .R files in the data directory of my package in the roxygen process?

I have put several .R files in the data directory. When they are sourced with data(), they read in raw data files and perform some transformations.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Karsten W.
  • 17,826
  • 11
  • 69
  • 103

3 Answers3

53

Roxygen can be used anywhere within an R file (in other words, it doesn't have to be followed by a function). It can also be used to document any docType in the R documentation.

So you can just document your data in a separate block (something like this):

#' This is data to be included in my package
#'
#' @name data-name
#' @docType data
#' @author My Name \email{blahblah@@roxygen.org}
#' @references \url{data_blah.com}
#' @keywords data
NULL
Shane
  • 98,550
  • 35
  • 224
  • 217
  • 11
    Except you're better off using `NULL` instead of `roxygen()` so that you don't induce a run-time dependency on `roxygen` – hadley Feb 22 '10 at 15:42
  • 3
    @hadley: it might be nice to add an example like this into the roxygen vignette, and make the point about roxygen dependency? I found that to be a little confusing in terms of how to structure the files. – Shane Feb 22 '10 at 16:35
  • 2
    Thank you both Shane and Hadley for the excellent help. I see now much clearer how to use roxygen; and now R CMD check does not complain anymore. One question is left: Do I need to put the data documentation in the R subdirectory? It would be nice to teach roxygenize to look in the data directory, too... – Karsten W. Feb 22 '10 at 16:56
  • 3
    @Karsten: I tend to think that the only thing that should go in the data subdirectory is data. Roxygen provides literate programming as R code, so I like to have that all within my R files. But beyond that you might try this: roxygenize uses an environment variable "R.DIR". Set that to "data" instead and it should process R files in the data directory. @hadley: you could make a simple patch to allow for an R.DIR vector? – Shane Feb 22 '10 at 17:50
  • 3
    @Shane: I've already complained about that to the roxygen devs, and it should change in the next release – hadley Feb 23 '10 at 01:17
  • @hadley I am using roxygen2 now. Is this implemented in this version? I put an `.R` file to describe my data objects in there but `R CMD Check` seems to not parse it. – Faustin Gashakamba Sep 18 '21 at 17:55
50

As of roxygen2 >4.0.0, you can document the data object defined elsewhere by documenting the name of the object defined as a string:

#' This is data to be included in my package
#'
#' @author My Name \email{blahblah@@roxygen.org}
#' @references \url{data_blah.com}
"data-name"
hadley
  • 102,019
  • 32
  • 183
  • 245
32

I found it useful to study the examples in the ggplot2 package.

See ggplot2.r on github

A few things of note:

  • All the Roxygen code for datasets can be included in a single .r file in the R directory of the package.

See for examples, the diamonds dataset:

#' Prices of 50,000 round cut diamonds
#'
#' A dataset containing the prices and other attributes of almost 54,000
#'  diamonds. The variables are as follows:
#'
#' \itemize{
#'   \item price. price in US dollars (\$326--\$18,823)
#'   \item carat. weight of the diamond (0.2--5.01)
#'   \item cut. quality of the cut (Fair, Good, Very Good, Premium, Ideal)
#'   \item colour. diamond colour, from J (worst) to D (best)
#'   \item clarity. a measurement of how clear the diamond is (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best))
#'   \item x. length in mm (0--10.74)
#'   \item y. width in mm (0--58.9)
#'   \item z. depth in mm (0--31.8)
#'   \item depth. total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
#'   \item table. width of top of diamond relative to widest point (43--95)
#' }
#'
#' @docType data
#' @keywords datasets
#' @name diamonds
#' @usage data(diamonds)
#' @format A data frame with 53940 rows and 10 variables
NULL

This results in a help file that looks like this:

roxygen documentation example

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173
  • 1
    Probably since that answer Roxygen documentation has changed. [this is how](https://github.com/tidyverse/ggplot2/blob/master/R/data.R) it looks now – vlad1490 Jun 22 '19 at 12:41
  • How it works now works because 'Lazyload: true'. If you don't 'Lazyload', those data objects are not defined. In that case, you need to use the 'old' Roxygen2 code. – Eli Holmes Jul 02 '21 at 23:53