3

I wish to go through a package and discover who are the authors mentioned for each function's help file.

I looked for a function to extract elements from R's help file, and could find one. The closest I could find is this post, from Noam Ross.

Does such a function exist? (if not, I guess I'll hack Noam's code in order to parse the Rd file, and locate the specific element I'm interested in).

Thanks, Tal.

Potential code example:

get_field_from_r_help(topic="lm", field = "Description") #
# output:

‘lm’ is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although ‘aov’ may provide a more convenient interface for these).

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • possible duplicate of [How to write contents of help to a file from within R?](http://stackoverflow.com/questions/7493843/how-to-write-contents-of-help-to-a-file-from-within-r) – Joshua Ulrich Jul 28 '13 at 14:16
  • Example input and output? – Spacedman Jul 28 '13 at 14:17
  • 1
    Joshua - it is not duplicate, since that only deals with the step of extracting the whole text, and not of how to parse it. Spacedman - in a minute. – Tal Galili Jul 28 '13 at 14:33
  • 3
    @TalGalili: You don't need to parse it; you just need to extract the portion you want. Do that by using `grep` for the section header you want, then get all the text until the next section. It might be easier to do using the HTML version of the help, which is described [here](http://stackoverflow.com/q/8918753/271616). – Joshua Ulrich Jul 28 '13 at 14:53
  • Thank you Joshua and Hadley, It looks like enough information for me to play with it. – Tal Galili Jul 28 '13 at 15:30

2 Answers2

5

This document by Duncan Murdoch on parsing Rd files will be helpful, as will this SO post.

From these, you could probably try something like the following:

getauthors <- function(package){
    db <- tools::Rd_db(package)
    authors <- lapply(db,function(x) {
        tags <- tools:::RdTags(x)
        if("\\author" %in% tags){
            # return a crazy list of results
            #out <- x[which(tmp=="\\author")]
            # return something a little cleaner
            out <- paste(unlist(x[which(tags=="\\author")]),collapse="")
        }
        else
            out <- NULL
        invisible(out)
        })
    gsub("\n","",unlist(authors)) # further cleanup
}

We can then run this on a package or two:

> getauthors("knitr")
                                                                                     d:/RCompile/CRANpkg/local/3.0/knitr/man/eclipse_theme.Rd 
                                                                                                                     "  Ramnath Vaidyanathan" 
                                                                                         d:/RCompile/CRANpkg/local/3.0/knitr/man/image_uri.Rd 
                                                                                                                    "  Wush Wu and Yihui Xie" 
                                                                                      d:/RCompile/CRANpkg/local/3.0/knitr/man/imgur_upload.Rd 
                                                                              "  Yihui Xie, adapted from the imguR package by Aaron  Statham" 
                                                                                          d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2pdf.Rd 
                                                                                         "  Ramnath Vaidyanathan, Alex Zvoleff and Yihui Xie" 
                                                                                           d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2wp.Rd 
                                                                                                          "  William K. Morris and Yihui Xie" 
                                                                                        d:/RCompile/CRANpkg/local/3.0/knitr/man/knit_theme.Rd 
                                                                                                       "  Ramnath Vaidyanathan and Yihui Xie" 
                                                                                     d:/RCompile/CRANpkg/local/3.0/knitr/man/knitr-package.Rd 
                                                                                                            "  Yihui Xie <http://yihui.name>" 
                                                                                        d:/RCompile/CRANpkg/local/3.0/knitr/man/read_chunk.Rd 
                      "  Yihui Xie; the idea of the second approach came from  Peter Ruckdeschel (author of the SweaveListingUtils  package)" 
                                                                                       d:/RCompile/CRANpkg/local/3.0/knitr/man/read_rforge.Rd 
                                                                                                          "  Yihui Xie and Peter Ruckdeschel" 
                                                                                           d:/RCompile/CRANpkg/local/3.0/knitr/man/rst2pdf.Rd 
                                                                                                               "  Alex Zvoleff and Yihui Xie" 
                                                                                              d:/RCompile/CRANpkg/local/3.0/knitr/man/spin.Rd 
"  Yihui Xie, with the original idea from Richard FitzJohn  (who named it as sowsear() which meant to make a  silk purse out of a sow's ear)" 

And maybe tools:

> getauthors("tools")
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/bibstyle.Rd 
                                                                "  Duncan Murdoch" 
                   D:/murdoch/recent/R64-3.0/src/library/tools/man/checkPoFiles.Rd 
                                                                "  Duncan Murdoch" 
                        D:/murdoch/recent/R64-3.0/src/library/tools/man/checkRd.Rd 
                                                  "  Duncan Murdoch, Brian Ripley" 
                     D:/murdoch/recent/R64-3.0/src/library/tools/man/getDepList.Rd 
                                                                   " Jeff Gentry " 
                      D:/murdoch/recent/R64-3.0/src/library/tools/man/HTMLlinks.Rd 
                                                    "Duncan Murdoch, Brian Ripley" 
            D:/murdoch/recent/R64-3.0/src/library/tools/man/installFoundDepends.Rd 
                                                                     "Jeff Gentry" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/makeLazyLoading.Rd 
                                                   "Luke Tierney and Brian Ripley" 
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/parse_Rd.Rd 
                                                                " Duncan Murdoch " 
                     D:/murdoch/recent/R64-3.0/src/library/tools/man/parseLatex.Rd 
                                                                  "Duncan Murdoch" 
                        D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2HTML.Rd 
                                                  "  Duncan Murdoch, Brian Ripley" 
                 D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2txt_options.Rd 
                                                                  "Duncan Murdoch" 
                   D:/murdoch/recent/R64-3.0/src/library/tools/man/RdTextFilter.Rd 
                                                                "  Duncan Murdoch" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/SweaveTeXFilter.Rd 
                                                                  "Duncan Murdoch" 
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/texi2dvi.Rd 
                     "  Originally Achim Zeileis but largely rewritten by R-core." 
                  D:/murdoch/recent/R64-3.0/src/library/tools/man/tools-package.Rd 
"  Kurt Hornik and Friedrich Leisch  Maintainer: R Core Team R-core@r-project.org" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteDepends.Rd 
                                                                   " Jeff Gentry " 
                 D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteEngine.Rd 
                                            "Duncan Murdoch and Henrik Bengtsson." 
                  D:/murdoch/recent/R64-3.0/src/library/tools/man/writePACKAGES.Rd 
                                                        "  Uwe Ligges and R-core."

Some functions have no author field, so this just drops those when it calls unlist at the end of getauthors, but the code could be modified slightly to return NULL values for those.

Also, further parsing is going to become a little bit difficult because package authors seem to use this field in very different ways. There's only one author field in devtools. There are a bunch in car, each of which contains an email address. Etc, etc. But this gets you to the available info, which you should be able to work with further.

Note: My previous version of this answer provided a solution if you have the full path of an Rd file, but didn't work if you were trying to do this for an installed package. Following Tyler's advice, I've worked out a more complete solution.

Community
  • 1
  • 1
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Can you show us an example for a package in which you run through all the .Rd files and grab the author. I tried this approach but was unable to have success and would love to see this much cleaner approach work. – Tyler Rinker Jul 29 '13 at 00:30
  • 1
    @TylerRinker See the update. I tried it on a couple of packages and seems to work in general. – Thomas Jul 29 '13 at 06:22
  • GREAT answer Thomas - thank you very much. Both informative, and useful :) – Tal Galili Jul 29 '13 at 15:19
  • 1
    Dear Thomas, I wanted to share with you that thank to your code I was able to add the "package_authors " function to the installr package - which just helped me in giving credit to people in the DESCRIPTION file. Thanks! (e.g: https://github.com/talgalili/installr ) – Tal Galili Aug 21 '13 at 19:46
1

This is my approach using some suggestions made by others:

package <- "qdap"
funs <- unclass(lsf.str(envir = asNamespace(package)))

out <- sapply(funs, function(x) {
    x <- try(capture.output(tools:::Rd2txt(utils:::.getHelpFile(as.character(help(x, help_type="text"))))))
    Auth_lines <- grep("_\bA_\bu_\bt_\bh_\bo_\br(_\bs):", x, fixed = TRUE) 
    if (identical(Auth_lines, integer(0))) {
        return(NA)
    }
    gsub("^\\s+|\\s+$", "", x[Auth_lines +2])
})

## To look at just the ones with author fields:
out[!sapply(out, is.na)]

## > out[!sapply(out, is.na)]
##                                                         beg2char 
##                   "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##                                                         bracketX 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                    bracketXtract 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                         char2end 
##                   "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##                                                 cm_df.transcript 
## "DWin, Gavin Simpson and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                            gantt 
##           "DigEmAll (<URL: stackoverflow.com>) and Tyler Rinker" 
##                                                       gantt_wrap 
##     "Andrie de Vries and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                             genX 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                        genXtract 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                             hash 
##      "Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                         name2sex 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                  read.transcript 
##      "Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                      sentCombine 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                        sentSplit 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                              TOT 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                          v.outer 
##   "Vincent Zoonekynd and Tyler Rinker <tyler.rinker@gmail.com>." 
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • Hi Tyler, great answer (you get +1, since Thomas seems to have found some nicer functions to rely on). Thanks :) – Tal Galili Jul 29 '13 at 15:18