10

After a bit of searching I am still not happy!

Is there a simple way to make a graph with a y-axis that starts at the origin and clearly shows all your data?

Here's my problem:

set.seed(123)
my.data<- data.frame(x.var = rnorm(100, 50),
                     y.var = rnorm(100, 50,10))


## Annoying because it doesn't start at origin
ggplot(my.data, aes(x.var, y.var))+
  geom_point()


## Annoying because origin is not at bottom
ggplot(my.data, aes(x.var, y.var))+
  geom_point()+
  expand_limits(y = 0)

## Annoying because point is cut off
ggplot(my.data, aes(x.var, y.var))+
  geom_point()+
  scale_y_continuous(expand = c(0,0))+
  expand_limits(y = 0)

The top answer for the question "Force the origin to start at 0 in ggplot2 (R)" ends with

"You may need to adjust things a little to make sure points are not getting cut off"

Why does this happen? I could manually adjust the axis but I don't want to have to do that every time!

Some dude on the internet has a solution that involves

#Find the current ymax value for upper bound
#(via http://stackoverflow.com/questions/7705345/how-can-i-extract-plot-axes-ranges-for-a-ggplot2-object#comment24184444_8167461 )
gy=ggplot_build(g)$panel$ranges[[1]]$y.range[2]
g=g+ylim(0,gy)

#Handle the overflow by expanding the x-axis
g=g+scale_x_continuous(expand=c(0.1,0))

Which seems complicated for what I feel like is a relatively simple idea. Am I missing something?

Thank you!


EDIT: As of summer of 2018 a ggplot update makes the above fix no longer work. Currently (August 2018) to get the y-max from the plot you now need to do the following.

gy=ggplot_build(g)$layout$panel_scales_y[[1]]$range$range[[2]]

Michael
  • 1,537
  • 6
  • 20
  • 42
  • 1
    or using @joran's ylimits, use `ggplot(my.data, aes(x.var, y.var)) + geom_point() + coord_cartesian(ylim = c(0, 1.05 * max(my.data$y.var)))` – rawr Nov 19 '14 at 23:26
  • Does http://stackoverflow.com/q/11214012/892313 answer your question, or do you need the bottom of the y-axis to be exactly 0 with no padding (while the top of the y-axis still has the normal padding)? – Brian Diggs Nov 20 '14 at 06:53
  • 1
    @BrianDiggs I was hoping to get no padding on the bottom and normal padding at the top. Maybe my desire to get y=0 as the bottom edge of the graph should be re-evaluated. Seems to be a question of how to best display information. – Michael Nov 20 '14 at 17:26
  • @Michael No Padding: Seems to me like a very reasonable request... – PatrickT Dec 04 '14 at 00:55
  • The limits option inside ``scale_x_continuous`` seems to remove the padding, if I understand. http://stackoverflow.com/questions/20050062/ggplot-axis-dont-intersect-at-origin?rq=1 – PatrickT Dec 04 '14 at 01:00

2 Answers2

14

I found this issue frustrating, and then read the R help file for expansion(). There is a good ggplot option for this that is facet-friendly, dynamic, and concise.

Quoting from the help file:

mult
vector of multiplicative range expansion factors. If length 1, both the lower and upper limits of the scale are expanded outwards by mult. If length 2, the lower limit is expanded by mult[1] and the upper limit by mult[2].

Note that add is also an option with similar structure. I would solve this issue like so:

ggplot(my.data, aes(x.var, y.var))+
  geom_point()+
  scale_y_continuous(limits = c(0, NA),
                     expand = expansion(mult = c(0, 0.1)))

A big reason to prefer this appraoch is if you have geoms with different aesthetics (e.g. points and error bars) and facets with free scales... you can still take advantage of ggplot's clever default y-axis behavior, but force x to intersect y at 0 in every panel, and still see the uppermost data points.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
Michael Roswell
  • 1,300
  • 12
  • 31
2

Why not just:

ggplot(my.data, aes(x.var, y.var))+
    geom_point()+
    scale_y_continuous(expand = c(0,0))+
    expand_limits(y = c(0,1.05 * max(my.data$y.var)))
joran
  • 169,992
  • 32
  • 429
  • 468
  • I suppose that would work although it still requires a fix that is different for every variable and data set I want to use it with. – Michael Nov 19 '14 at 23:34
  • @Michael If you want custom axis limits that aren't calculated automatically, you should calculate them yourself. For each data set. In my own code, if I have nit picky axis requirements I will write some code that calculates the ranges I need using all data frames used as input. – joran Nov 19 '14 at 23:36
  • I suppose it just seems odd that by invoking an option to show the origin ggplot will create a plot that cuts off data. I thought I might be missing something, but maybe there's no other way than to manually expand the limits every time. – Michael Nov 19 '14 at 23:40
  • this is no more work than setting the x and y aesthetics in the ggplot call which would also be different for every variable in the data set – rawr Nov 19 '14 at 23:44
  • 1
    it is certainly "more work"! You could say it wasn't very different than setting the aesthetics for x and y, but it is certainly another step nonetheless. – Michael Nov 20 '14 at 00:01
  • @Michael I recall some discussion of using Inf when setting axis limits. I can't recall if it was actually implemented. I'm away from my computer so I can't test it. Give Inf and NA a try... – joran Nov 20 '14 at 00:14
  • 1
    using ylim(0,NA) takes care of the upper limit but places y=0 above the bottom edge of the graph. To me that seems not ideal as your eyes will visualize the space under points as the magnitude of that value. Maybe I should just forget about it.... – Michael Nov 20 '14 at 17:28