38

Recently a few neat uses of ggplot2 have come up, and either partial or full solutions have been posted:

ggheat is notable because it rather breaks the ggplot metaphor by just plotting rather than returning an object.

The curly brace solutions are notable because none really fits in the ggplot2 high-level concept (e.g. you should be specifying a range of points you want to breaks, and then somewhere else be able to specify the geom of how you want that range displayed--brace, box, purple cow, etc.).

The ggplot2 book (which I will order soon and have read the 2 online chapters) seems to be about using the grammar and functions rather than writing new ones or extensively extending existing ones.

I would like to learn to add a specific feature or develop a new geom, and do it properly. ggplot2 may not be intended as a general graphics package in the same way that grid or base graphics are, but there are a great many graphs which are only a step or two extension from an existing ggplot2 geom. When these situations come up, I can typically put together enough objects to do something once, but what if I need the same plot a few dozen times? What if other people like it and want to use it--now they have to kludge through the same process each time they want that graph. It seems to me that the proper solution is to add in a stat_heatplot and geom_heatplot, or to add a geom_Tuftebox for Tufte box plots, etc. Yet I've never seen an example of actually extending ggplot2; just examples of how to use it.

What resources exist to dig deeper into ggplot2 and start extending it? I'm particularly interested in a high-level way to specify a range on an axis as described above, but general knowledge about what makes ggplot2 tick is welcome as well.

Absent a coherent guide (which rarely exists for sufficiently advanced tinkering and therefore may not exist here), how would one go about learning about the internals? Inspecting source is obviously one way, but what functions to start with, etc.

Community
  • 1
  • 1
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • 20
    Unfortunately the `makeMeHadley()` function on my R installation is broken. Perhaps if I tried `make_me_hadley()` instead? – Ari B. Friedman Aug 11 '11 at 16:58
  • 2
    i've added a [wishlist page](https://github.com/hadley/ggplot2/wiki/wishlist---feature-requests) to the ggplot2 wiki to list various ideas of extensions. – baptiste Aug 11 '11 at 21:51
  • 1
    @AriB.Friedman: Is that a command, or a request? And who is it directed to? :P – naught101 Oct 09 '12 at 23:18
  • 1
    @naught101 You have to pick up your mouse and talk to it in a Scottish accent: "Computer. Make me Hadley." – Ari B. Friedman Oct 10 '12 at 00:19

3 Answers3

25

ggplot2 is gradually becoming more and more extensible. The development version, https://github.com/hadley/ggplot2/tree/develop, uses roxygen2 (instead of two separate homegrown systems), and has begun the switch from proto to simpler S3 classes (currently complete for coords and scales). These two changes should hopefully make the source code easier to understand, and hence easier for others to extend (backup by the fact that pull request for ggplot2 are increasing).

Another big improvement that will be included in the next version is Kohske Takahashi's improvements to the guide system (https://github.com/kohske/ggplot2/tree/feature/new-guides-with-gtable). As well as improving the default guides (e.g. with elegant continuous colour bars), his changes also make it easier to override the defaults with your own custom legends and axes. This would make it possible to draw the curly braces in the axes, where they probably belong.

The next big round of changes (which I probably won't be able to tackle until summer 2012) will include a rewrite of geoms, stats and position adjustments, along the lines of the sketch in the layers package (https://github.com/hadley/layers). This should make geoms, stats and position adjustments much easier to write, and will hopefully foster more community contributions, such as a geom_tufteboxplot.

hadley
  • 102,019
  • 32
  • 183
  • 245
  • 1
    Sounds like this will have come to fruition in 1.1.0. Thanks @hadley and the rest of the ggplot2 team. It looks like vignette("extending-ggplot2") will explain how to extend. – Ari B. Friedman Sep 13 '15 at 16:38
  • 1
    Official extension mechanism now available in 2.0.0 http://blog.rstudio.org/2015/12/21/ggplot2-2-0-0/ – Ari B. Friedman Dec 22 '15 at 11:04
9

I am not certain that I agree with your analysis. I'll explain why, and will then point you to some resources for writing your own geoms.

ggheat

As far as I can tell, ggheat returns an object of class ggplot. Thus it is a convenient wrapper around ggplot, customised for a specific use case. Although qplot is far more generic, it does in principle the same thing: It is a wrapper around ggplot that makes some informed guesses about the data and chooses sensible defaults. Hadley calls this plot functions and it is described briefly on page 181 of the ggplot2 book.

curly braces

The curly brace solution does exactly what the ggplot philosophy says, i.e. separate data from presentation. In this case, the data is generated by a little custom function and is stored in a data.frame. It is then displayed using a geom that makes sense, i.e. geom_line.

quo vadis?

You have noted (in the r chat room) that you would prefer to have a more generic approach to plotting the curly braces. Something along the following lines (and I paraphrase and extend at the same time):

  • Supply data in the form of a bounding box coordinates (i.e. x0, x1, y0 and y1)
  • Specify a "statistic", such as brace, box or whatever
  • Specify a geom, such as geom_custom_shape

This sounds like a nice generalisation and extension of the ideas behind the curly brace solution, and would clearly require writing a new geom. There is an official ggplot wiki, where you can find instructions for creating a new geom.

Community
  • 1
  • 1
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 1
    +1 & Accept, for the link to the creating a new geom page, which itself links to some good references on `proto`, of which @Gavin explained the importance. – Ari B. Friedman Aug 11 '11 at 18:16
  • Are the instructions linked to above for creating a new geom up to date? And what is recommended practice for including new geoms in packages to be submitted to CRAN? – Frank Harrell Dec 20 '14 at 14:25
4

Why do you want to extend it? What is the motivation? As I see it ggplot2 is meant to be a high-level graphics package designed to produce nice figures from a particular data set. And do things right and make other things easy: like scales, legends etc. ggplot2 is not meant to be a general-purpose graphics tool-kit. Like lattice it has a particular paradigm in mind and you use it for that purpose.

grid is the underlying graphical toolkit you want to use to do general purpose, customised plotting. And IIRC, it is relatively easy to add grid grobs to lattice or ggplot2 plots/objects, for this sort of arbitrary notation/annotation etc.

What doesn't make too much sense is extending ggplot2 or lattice along the lines you are thinking. I don't see why the ggplot2 can't do heatplots as it is? Or am I missing something here?

What would be very useful would be if the data processing guts of ggplot2 or lattice were available for others to write actual plotting code on top of. Hadley has mentioned this somewhere before.

ggplot2, in particular, and lattice are quite difficult codes to get into to read/understand. ggplot2 uses the proto package for a version of OOP, which means you need to understand what that is doing as well as ggplot2 semantics. lattice is similar as there is a lot of computing on the language done there that, if you are not familiar with that sort of R programming, can by quite intimidating, daunting and impenetrable!

For grid, I suggest you look at Paul Murrell's R Graphics book, a second edition of which is with the publisher: http://www.stat.auckland.ac.nz/~paul/RG2e/

Edit: The point I was intending to get across was that the interfaces provided by packages like ggplot2 and lattice are necessarily high-level. Extending them is fine as long as they stick to the paradigm/philosophy in use. Heatplots can already be made by using existing geoms; part of the philosophy of the ggplot system is to separate the data from the display/presentation, and to use geoms in interesting ways to produce the desired display.

Wrapping base ggplot + geom calls into a more user friendly function is OK as long as i) it works like ggplot already does and returns an object, and ii) it doesn't have an interface that is too different from the way ggplot works. Developers are free to write whatever code they want, it just isn't helpful to the wider community to provide wrappers that move too far away from the original's workings. That leads to confusion on the part of the user and doesn't foster learning of ggplot2 itself.

The dynamic positioning idea is interesting; you could include these ideas in all plotting packages. You could bolt this into a geom, or alternatively as an external function that modified the input coordinates to produce a new data object that could be used by the relevant geom. That same function could be used for other plotting packages - it wouldn't need to be ggplot-specific.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Hadn't seen Murrell's book before. It would've been handy when I was mucking around with the grid internals. Clarified question to address some of your points. – Ari B. Friedman Aug 11 '11 at 18:05
  • 2
    I think you will find that Hadley is doing a lot of work to turn at least some of the elements of ggplot into a general purpose toolkit. In particular, the work to generate intelligent scales is a very tough problem. So a lot of work is happening at the moment to make the ggplot scales available as general purpose tools for use by, for example, lattice. (see, for example, this discussion on the ggplot2 mailing list http://groups.google.com/group/ggplot2/browse_thread/thread/8f5a1a7513ef0042) – Andrie Aug 11 '11 at 18:07
  • @Andrie - Thanks for the specifics and the link. I had seen bits of this, but couldn't lay my hands/brain on them so was non-committal in my Answer as to what Hadley had said he would do/ was doing. Making some of the clever internals of ggplot2 available to others would be a very useful contribution indeed. – Gavin Simpson Aug 11 '11 at 18:10
  • 2
    You can get your hands on a copy of the `scales` package at github: https://github.com/hadley/scales (It is worth navigating to this page simply to read the README) – Andrie Aug 11 '11 at 18:12
  • Wow - interesting that someone downvoted this --- didn't think I was saying anything too controverial – Gavin Simpson Aug 11 '11 at 22:27
  • 1
    @Gavin: I +1'd, not -1'd. But I do think that "I don't see why the ggplot2 can't do heatplots as it is?" misses my point. Of course ggplot2 can do heatplots. But when people add various plot extensions to ggplot2, they do so in an ad hoc way that sometimes doesn't fit with the philosophy (draw a box here with this color, etc. etc. rather than specify the data and aesthetics separately). I'd like to know how to do a step better than that, to save others and my future self the work of re-creating it, and to make it more legible. – Ari B. Friedman Aug 12 '11 at 07:33
  • @gsk3 I've updated my answer a little, to clarify what I meant, esp re the heat plots. I essentially agree with the point about fitting in with the design and philosophy of the package you are wrapping/extending. – Gavin Simpson Aug 18 '11 at 07:07
  • Why make it extensible? Sometimes it is needed, as was the case with my package ggtern (www.ggtern.com) where I wanted to use ggplot2 as a 'platform' for a specialist plotting package. – Nicholas Hamilton Jan 03 '14 at 09:58
  • @NicholasHamilton by "why" I didn't mean "why would you ever?", just "why in this instance?". I realised this didn't mesh with the more general nature of Ari's question, but I think it is important. **ggplot2** has a philosophy of graphical visualization as well as a graphical "tool kit" with which to draw things. Extensions to **ggplot2** need to follow the philosophy, the grammar, otherwise they are just using Hadley's **grid** code, which is fine, so long as they don't go by some *ggFoo* name. From what I've seen, **ggtern** does try to follow that philosophy. – Gavin Simpson Jan 03 '14 at 17:12