10

The title is relatively self explanatory. I would like to know how ggplot decides its default breaks (and hence labels).

From the below code, it looks like the method is the same for each geom:

library(ggplot2)

ggplot(data=mtcars,mapping=aes(x=carb,y=hp,fill=as.factor(gear)))+
  geom_bar(stat="identity",position="dodge")

ggplot(data=mtcars,mapping=aes(x=carb,y=hp,fill=as.factor(gear)))+
  geom_point()

Any help would be greatly appreciated

T.Holme
  • 473
  • 4
  • 17

1 Answers1

15

I had the same question myself, and Google brought me to this SO question, so I thought I'd do a bit of digging.

Suppose we plot

library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg, size = hp)) +
  geom_point() 

which gives us the following plot, and we wish to know how the breaks for mpg (10, 15, ..., 35), cyl (4, 5, ..., 8), and hp (100, 150, ..., 300) are derived.

enter image description here

Focusing on mpg we inspect the code for scale_y_continuous and see that it calls continuous_scale. Then, calling up ?continuous_scale we see, under the description for the trans argument, that

A transformation object bundles together a transform, it's inverse, and methods for generating breaks and labels.

Then, looking up ?scales::trans_new, we see that the default value for the breaks argument is extended_breaks(). Following the trail, we find that scales::extended_breaks calls labeling::extended(rng[1], rng[2], n, only.loose = FALSE, ...). Applying this to our data,

with(mtcars, labeling::extended(range(mpg)[1], range(mpg)[2], m = 5))
# [1] 10 15 20 25 30 35

which is what we observe in the plot. This raises the question of why, despite

with(mtcars, labeling::extended(range(hp)[1], range(hp)[2], m = 5))
# [1]  50 100 150 200 250 300 350

we don't observe 50 and 350 in the legend. My understanding is that the answer is related to https://stackoverflow.com/a/13888731/6455166.

Community
  • 1
  • 1
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • 2
    Great digging. Off-topic comment: if you wanted to set the number of y breaks equal to the number of x breaks, for symmetry, you could do this: ``xbreaks <- ggplot_build(p)$layout$panel_ranges[[1]]$x.major_source`` to extract the breaks, then use them in your ggplot: ``p <- p + scale_y_continuous(breaks = xbreaks)`` – PatrickT Apr 22 '18 at 14:23
  • I also found `extended()`s options really helpful, even though the documentation is basically just "rtfa." `Q`, "a set of nice numbers," worked really well for me to scale durations in minutes and hours, namely with `c(15, 20, 30, 60)` - so the scale appropriately tried to give me breaks at those minute-marks. – DHW Oct 21 '19 at 19:47
  • instead of `with(mtcars, labeling::extended(range(mpg)[1], range(mpg)[2], m = 5))`, you can also use `scales::extended_breaks()(mtcars$mpg)` – tjebo Apr 01 '23 at 14:12
  • The limited range in the legend guide is due to the way that breaks are calculated relative to outer limit that lies within a certain expansion of the data, but this expansion is generally not added to a legend guide, thus the outer limits are being dropped. see https://stackoverflow.com/questions/75907164/default-breaks-in-ggplot2-where-are-the-break-limits-dropped-for-legend-guides – tjebo Apr 02 '23 at 16:05