18

When using ArcGIS to make maps, the software by default pushes point and polygon labels around automatically to avoid overlap using a proprietary algorithm. They refer to this as dynamic labeling. ggplot2 has position_jitter which is excellent for points (since dynamic labeling might create systematic bias), but less good for labels using geom_text.

Here's an example of some problems with jitter that a dynamic labeling algorithm might solve:

library(ggplot2)
ggplot( mtcars,aes( x=wt, y=mpg, label=rownames(mtcars) ) ) +
  geom_point() +
  geom_text( position=position_jitter(h=1,w=1) )

jittered labels with problems noted

Does such a dynamic labeling feature exist already in ggplot2?

If not, what algorithms exist for doing so and is it possible to implement a position_dynamic in R?

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235

4 Answers4

12

Check out the new package ggrepel. ggrepel provides geoms for ggplot2 to repel overlapping text labels. It works both for geom_text and geom_label.

enter image description here

Figure is taken from this blog post.

Sukhi
  • 826
  • 1
  • 8
  • 19
9

I ran into a similar problem with several of the plots I have been working with and wrote a basic package that uses force field simulation to adjust object location. While much improvement is possible, including integration with ggplot, etc. it seems to get the task accomplished. The following illustrates the functionality:

install.packages("FField", type = "source")
install.packages("ggplot2")
install.packages("gridExtra")
library(FField)
FFieldPtRepDemo()
gregk
  • 91
  • 1
  • 2
  • 1
    I really wanted to edit your post with a plot and give you serious cred for implementing a package based on this, but I don't find it in CRAN yet (either via `install.packages` in R 2.15.2 or on the CRAN webpage). Did you just, just release (which case I'll way a day or two)? Thanks for writing this package! – Ari B. Friedman Jun 27 '13 at 01:14
  • 1
    `package ‘FField’ is not available (for R version 3.0.1)`. Is this package publicly available? – Simon MᶜKenzie Jun 27 '13 at 01:16
  • 3
    Indeed: There were some issues with the CRAN submission and I was confused about the earlier package availability. It is available now in source form (http://cran.r-project.org/web/packages/FField/index.html) The sequence for the demo of label repulsion: `install.packages("FField", type = "source") library(FField) install.packages("ggplot2") install.packages("gridExtra") FFieldPtRepDemo()` The code is quite self-explanatory: `FFieldPtRepDemo` For now there is no intelligent heuristics for a variety of areas and point distributions as I wanted to get something helpful to folks quickly. – gregk Jun 28 '13 at 16:36
  • 1
    It now seems available on CRAN, and it's brilliant. I feel like there is some power in this, but I'll have to work out the documentation a little to fully appreciate it :) – DaveRGP Jul 28 '15 at 14:57
8

AFAIK, the best that exists is directlabels, available from R-forge and CRAN and with a comprehensive examples page.

This seems a good starting point, but in my opinion has the following negative aspects:

  • Unlike the ggplot philosophy of separating data and presentation, directlabels returns a ggplot object rather than a geom
  • it only works on the group aesthetic, not individual points

I have glanced at the source code some time ago and I think it should be reasonably easy to adapt the code to address both points I mention.

There is an example of how to use this with ggplot in this question on SO

Community
  • 1
  • 1
Andrie
  • 176,377
  • 47
  • 447
  • 496
7

This isn't anything that can be used directly in ggplot2, but the ordipointlabel() function in package vegan tries to do something similar. It displays data as points and tries to label each point with the appropriate label, using an optimisation algorithm to position the labels next to their point but without overlapping other labels and points.

?ordipointlabel mentions that it is based on pointlabel() in the maptools package, which could be another place to look for inspiration.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Nothing is impossible in either R or ggplot, but I agree that the state of play in ggplot at the moment is sub-optimal. – Andrie Aug 09 '11 at 13:30
  • 1
    `labcurve` in the `Hmisc` package also does something like this, but it has even more of its own associated infrastructure (i.e. it would be even harder to dig the pieces out for use in `ggplot`) – Ben Bolker Aug 09 '11 at 13:35
  • 1
    @Andrie, Perhaps wording was wrong. I meant my Answer had nothing to do with ggplot2 directly. It would be possible to adapt the code in the functions I mention to find coordinates for the labels that could be used in a standar.d `geom_text()` call – Gavin Simpson Aug 09 '11 at 13:37
  • I think pointlabel() uses simulated annealing. It would be very cute (but not at all trivial) to try to come up with an algorithm based on minimizing an energy -- start the labels at the points and then "repel" them based on overlap ... – Ben Bolker Aug 09 '11 at 17:36
  • @Ben yes, it does. Jari modelled `ordipointlabel()` on that function and he uses SANN too. – Gavin Simpson Aug 09 '11 at 17:50