22

Probably a pretty basic question but a friend and I tried to run str(packge_name) and R threw us an error. Now that I'm looking at it, I'm wondering if an R package is like a .zip file in that it is a collection of objects, say pictures and songs, but not a picture or song itself.

If I tried to open a zip of pictures with an image viewer, it wouldn't know what to do until I unzipped it - just like I can't call str(forecast) but I can call str(ts) once I've loaded the forecast package into my library...

Can anyone set me straight?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
d8aninja
  • 3,233
  • 4
  • 36
  • 60
  • 4
    You might be more impressed with `ls.str("package:packageName")` – Rich Scriven Jan 13 '15 at 16:25
  • 5
    A package is just a bundle of R functions (with documentation) glued together and organized by DESCRIPTION and NAMESPACE files. A package itself is not an R object. – Roland Jan 13 '15 at 16:27
  • Well, sometimes more than strictly functions. Sometimes there are also data sets and other non-function objects necessary to make the package run – Rich Scriven Jan 13 '15 at 16:28
  • @Roland so if it's not an object....what is it? – d8aninja Jan 13 '15 at 16:29
  • @RichardScriven Sure, but let's cover the most basic case first. – Roland Jan 13 '15 at 16:29
  • @RichardScriven nope - `> library(fpp) > ls.str(fpp) Error in ls.str(fpp) : object 'fpp' not found` – d8aninja Jan 13 '15 at 16:32
  • 1
    @Canuckish - you have to type it as I did, `ls.str("package:fpp")` The function `ls.str` needs to know that you want to view the package contents – Rich Scriven Jan 13 '15 at 16:33
  • 1
    @Canuckish I'm not sure there really is an *object type* for packages, but along the lines of @RichardScriven's comment, I would guess it most closely resembles an `environment`, at least in the sense that you can call things like `ls(name="package:ggplot2")` or `ls.str(name="package:ggplot2")`. – nrussell Jan 13 '15 at 16:33
  • 3
    You might find http://r-pkgs.had.co.nz/package.html helpful – hadley Jan 13 '15 at 19:21
  • Awesome - thanks @hadley! (Someone also referenced this excellent resource, below) – d8aninja Jan 13 '15 at 20:30

4 Answers4

22

R packages are generally distributed as compressed bundles of files. They can either be in "binary" form which are preprocessed at a repository to compile any C or Fortran source and create the proper headers, or they can be in source form where the various required files are available to be used in the installation process, but this requires that the users have the necessary compilers and tools installed at locations where the R build process using OS system resources can get at them.

If you read the documentation for a package at CRAN you see they are distributed in set of compressed formats that vary depending on the OS-targets:

Package source:     Rcpp_0.11.3.tar.gz  # the Linus/UNIX targets
Windows binaries:   r-devel: Rcpp_0.11.3.zip, r-release: Rcpp_0.11.3.zip, r-oldrel: Rcpp_0.11.3.zip
OS X Snow Leopard binaries:     r-release: Rcpp_0.11.3.tgz, r-oldrel: Rcpp_0.11.3.tgz
OS X Mavericks binaries:    r-release: Rcpp_0.11.3.tgz
Old sources:    Rcpp archive   # not really a file but a web link

Once installed an R package will have a specified directory structure. The DESCRIPTION file is a text file with specific entries for components that determine whether the local installation meets the dependencies of the package. There are NAMESPACE, LICENSE, and INDEX files. There are directories named '/help', '/html', '/Meta', '/R', and possibly '/libs', '/demo', '/data', '/unitTests', and others.

This is the tree at the top of the ../library/Rcpp package directory:

$ ls
CITATION    NAMESPACE   THANKS      examples    libs
DESCRIPTION NEWS.Rd     announce    help        prompt
INDEX       R       discovery   html        skeleton
Meta        README      doc     include     unitTests

So in the "life-cycle" of a package, there will be initially a series of required and optional files, which then get processed by the BUILD and CHECK mechanisms into an installed package, which than then get compressed for distribution, and later unpacked into a specified directory tree on the users machine. See these help pages:

?.libPaths  # also describes .Library()
?package.skeleton
?install.packages
?INSTALL

And of course read Writing R Extensions, a document that ships with every installation of R.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
19

Your question is:

What type of object is an R package?

Somehow, I’m still missing an answer to this exact question. So here goes:

As far as R is concerned, an R package is not an object. That is, it’s not an object in R’s type system. R is being a bit difficult, because it allows you to write

library(pkg_name)

Without requiring you to define pkg_name anywhere prior. In contrast, other objects which you are using in R have to be defined somewhere – either by you, or by some package that’s loaded either explicitly or implicitly.

This is unfortunate, and confuses people. Therefore, when you see library(pkg_name), think

library('pkg_name')

That is, imagine the package name in quotes. This does in fact work just as expected. The fact that the code also works without quotes is a peculiarity of the library function, known as non-standard evaluation. In this case, it’s mostly an unfortunate design decision (but there are reasons).

So, to repeat the answer: a package isn’t a type of R object1. For R, it’s simply a name which refers to a known location in the file system, similar to what you’ve assumed. BondedDust’s answer goes into detail to explain that structure, so I shan’t repeat it here.


1 For super technical details, see Joshua’s and Richard’s comments below.

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    Tsk, tsk... s/lib_name/pkg_name. :) The only thing I might add is that `pkg_name` *is* an object (an unbound symbol in the pairlist containing the function arguments)... though that might be too technical. – Joshua Ulrich Jan 13 '15 at 16:54
  • I think this is a damn fine answer, and I'm stuck between checking it, or @BondedDust's above. As his was the first I checked, I'm going to put it back there. But I do really like this response. Much thanks. – d8aninja Jan 13 '15 at 16:55
  • 1
    Just a note, the `TypeTable` structure [found in src/main/util.c](http://svn.r-project.org/R/trunk/src/main/util.c) shows all the base types. *Package* is not one of them. Thought that might be useful to someone. :) – Rich Scriven Jan 13 '15 at 17:01
  • 1
    Agree the answer is useful. It is written from the perspective of someone viewing the world from an R console and interpreting the errors one gets. Mine was written from the perspective of someone using R in one of the three target OSes. – IRTFM Jan 13 '15 at 17:01
  • @Joshua To be honest, I’m unhappy with that aspect of my answer, in particular since `library` isn’t the only relevant place where you may encounter this (think `pkg::obj`). I’m still mulling over whether to update my answer, or whether this would be more confusing than helpful. – Konrad Rudolph Jan 13 '15 at 17:01
  • Yeah, and `pkg::obj` is even worse because it's less obvious that `::` is a function call. – Joshua Ulrich Jan 13 '15 at 17:11
5

From R's own documentation:

Packages provide a mechanism for loading optional code, data and documentation as needed.…A package is a directory of files which extend R, a source package (the master files of a package), or a tarball containing the files of a source package, or an installed package, the result of running R CMD INSTALL on a source package. On some platforms (notably OS X and Windows) there are also binary packages, a zip file or tarball containing the files of an installed package which can be unpacked rather than installing from sources. A package is not a library.

So yes, a package is not the functions within it; it is a mechanism to have R be able to use the functions or data which comprise the package. Thus, it needs to be loaded first.

Avraham
  • 1,655
  • 19
  • 32
  • Very useful. But as @Roland noted it's not an object - so is it a simply a directory? – d8aninja Jan 13 '15 at 16:31
  • 1
    @Canuckish, no the directory in which the package lives is called the `library`. Which is confusing, as packages are loaded using the `library(foo)` function call. I've fixed the hyperlink in the above answer to point to the proper manual page. – Avraham Jan 13 '15 at 16:34
  • Really was trying to get at the "Type" of a package - it's interesting to me that even though "everything in R is a vector", I have to use a special (list structure?) call like `ls.str("package:package_name") as recommended above. – d8aninja Jan 13 '15 at 16:41
4

I am reading Hadley's book Advanced-R (Chapter 6.3 - functions, p.79) and this quote will cover you I think:

Every operation is a function call
“To understand computations in R, two slogans are helpful:

Everything that exists is an object.
Everything that happens is a function call."
— John Chambers

According to that using library(name_of_library) is a function call that will load the package. Every little bit that has been loaded i.e. functions or data sets are objects which you can use by calling other functions. In that sense a package is not an object in any of R's environments until it is loaded. Then you can say that it is a collection of the objects it contains and which are loaded.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • Somewhat along the lines of what I'm looking for, but why would str(package_name) throw an error if str is to "Compactly display the internal structure of an R object" – d8aninja Jan 13 '15 at 16:28
  • 1
    Because it is all about the `environments`. You need to have a look at the above link on the `environments` chapter. If you load a package its `environment` is added before the global environment in a set of `environments`. The `environments` are practically where R looks for every object. Unless you load a `package` R is not able to find where that name is and hence you get an error. – LyzandeR Jan 13 '15 at 16:31
  • 3
    I don’t think this quote really helps OP. In fact, it’s actually quite misleading because “everything that exists is an object” but, despite appearances to the contrary, the `package_name` in `str(package_name)` does *not* exist as far as R is concerned (and isn’t an object), unless OP has defined it previously. – Konrad Rudolph Jan 13 '15 at 16:39
  • @KonradRudolph But I am mentioning above, that a package is a collection of objects and not an object itself. – LyzandeR Jan 13 '15 at 16:41
  • 1
    Ultimately, the OP was concerned with what type of object the package is. "Collection of objects" is a fine answer, but doesn't answer the question. That said, your response was illustrative, and I value your input. – d8aninja Jan 13 '15 at 16:59