14

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.

I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.

Is it possible to save the 'content' of ?mtcars and how is it created?

Thanks, Eric

P.S. I did read this thread.

update 2012-05-14 00:39:59 PDT

I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)

update 2012-05-14 09:49:05 PDT

Thank you very much everyone for the many answers.

Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.

  1. R, I am looking for a solution that is based exclusively on R.
  2. Reproducibility, that the codebook can be part of a automated script.
  3. Readability, the text should be easy to read.
  4. Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).

I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.

Community
  • 1
  • 1
Eric Fail
  • 8,191
  • 8
  • 72
  • 128
  • 1
    I didn't know you could look up info on data sets +1 for a question that made me aware of that. – Tyler Rinker May 14 '12 at 02:59
  • 3
    Creating a package that contains your data set will allow you to have the same functionality as mtcars. Is that what you're interested in? – Dason May 14 '12 at 03:49
  • @Dason, I'm interested to find a solution, using only R, that enables me to automatically create a data codebook (whenever I pull data from a database). I prioritize a simple software set up to a formatted pdf output, I might have gotten too optimistic when I saw the documentation that came with mtcars. If I cannot create a pdf I would scale back and simply create a .txt file, maybe using the `Label()` from the Hmisc package. – Eric Fail May 14 '12 at 07:28
  • @Eric, what exactly do you mean by codebook? – cbeleites unhappy with SX May 14 '12 at 11:58

5 Answers5

6

(I'm not completely sure what you're after, but):

  • Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.

  • However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.

  • For user-data, usually Noweb is much more convenient.
    .Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files. However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.

  • Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)

  • If html is an option, both Sweave and knitr can work with that.

cbeleites unhappy with SX
  • 13,717
  • 5
  • 45
  • 57
  • Thank you for answering my question. I am sorry if I was unclear. I've added another update to my question. Please let me know if you have any questions or if it still unclear. Thanks, Eric – Eric Fail May 14 '12 at 16:54
  • @EricD.Brean: Am I guessing right that you are after some automatic report generation? If so, and you really cannot do anything with external software, I'm afraid you'll have to stick with some kind of .txt. R ratehr goes with the http://en.wikipedia.org/wiki/Unix_philosophy... – cbeleites unhappy with SX May 14 '12 at 17:45
  • Thanks, I respect that. I much prefer software that [_does one thing and does it well_](http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quarter_Century_of_Unix). Thanks, Eric – Eric Fail May 14 '12 at 18:18
4

I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:

path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD", 
    "Rd2pdf",shQuote(path)))

I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.

PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • Thank you for your answer, the thing is that I need to write something that my professor (and other people in our lab) can use without them having to install Tex. I should maybe have underscored that in my initial post. – Eric Fail May 14 '12 at 03:03
  • 4
    @Tyler -- If you are set up with a working LaTeX system, you can get a pdf for an individual data set by doing, e.g., `help(mtcars, help_type="pdf")`. – Josh O'Brien May 14 '12 at 04:39
  • Hmm, very interesting. Didn't know that. It seems then the answer to this problem is a combination of your answer and Dason's. Eric didn't want to have his colleagues work with a Tex program but he could create the PDFs himself and distribute them, perhaps via dropbox so they could be updated. – Tyler Rinker May 14 '12 at 04:42
  • 1
    @Eric, you can execute `R CMD Rd2pdf` on any .Rd file. So if you create such a file for your dataset (for example based on the mtcars.Rd file found in the R source code), you can replace `path` in @Tyler's answer with the path to your .Rd file. This is, again, assuming you have a working TeX system at hand. – BenBarnes May 14 '12 at 06:16
  • @BenBarnes, unfortunately I cannot use a Tex solution. I have to use the simplest setup possible. Please see reply to Dason (above) for more details) – Eric Fail May 14 '12 at 07:38
  • The documentation for `Rd2pdf` doesn't seem to require TeX to create a pdf file. I've never used it, so can someone comment on what sort of pdf output file you get when not specifying TeX in the options? – Carl Witthoft May 14 '12 at 11:13
  • @CarlWitthoft: `R CMD Rd2pdf` does use `pdflatex`. – cbeleites unhappy with SX May 14 '12 at 12:02
  • 1
    @Carl, the "Writing R Extensions" manual is cited at the very end of the `Rd2pdf` documentation. The chapter in question contains these two somewhat cryptic sentences: "All [Rd conversion functions] work under Windows. You may need to have installed the the tools to build packages from source as described in the “R Installation and Administration” manual, although typically all that is needed is a LaTeX installation." – BenBarnes May 14 '12 at 12:21
  • They are referring to the fact that RTools, that is the tools to build packages, comes with tools that connect to a preexisting TeX installation. So, as cbeleities asserted, R CMD Rd2pdf does use pdflatex. There is no existing pure R way to build pdfs from .Rd or from .TeX – russellpierce Mar 04 '13 at 15:57
2

The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.

.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.

For more information on creating your own packages, writing .Rd files, and building packages: http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".

Edit

And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .

jthetzel
  • 3,603
  • 3
  • 25
  • 38
2

You can't create a PDF with just R; you need to use other software that creates PDFs.

You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.

It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thank you for answering my question. I might have been overly optimistic in regard to R's ability to produce pdf files. Apparently this is not possible and, as I write in my second update above, I am looking for a solution based on R alone. Maybe a HTML file would be a good solution. Thanks, Eric – Eric Fail May 14 '12 at 17:00
1

It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.

Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.

In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:

base_text = "
First header
============

This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)

Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149