11

I have a short R script which plots a few histograms using ggplot2. How can I automatically set the ymax limit in the histogram based on the maximum frequency in the histogram (plus 10%) i.e

scale_y_continuous(limits= c(0,ymax*1.1)

plot = ggplot(data, aes(myo_activity)) +
  geom_histogram(binwidth=0.5, aes(fill=..count..))
plot + scale_x_continuous(expand = c(0,0), limits = c(30,90)) + 
  scale_y_continuous(expand = c(0,0), limits = c(0,140))
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
moadeep
  • 3,988
  • 10
  • 45
  • 72

2 Answers2

34

For example used data movies as sample data are not provided.

With function ggplot_build() you can get list containing all the elements used for plotting your data. All the data are in list element data[[1]]. Column count of this element contains values for histogram. You can use maximal value of this column to set limits for your plot.

plot = ggplot(movies, aes(rating)) + geom_histogram(binwidth=0.5, aes(fill=..count..))
ggplot_build(plot)$data[[1]]
      fill    y count     x     ndensity       ncount      density PANEL group ymin ymax xmin xmax
1  #132B43    0     0  0.75 0.0000000000 0.0000000000 0.0000000000     1     1    0    0  0.5  1.0
2  #142E48  272   272  1.25 0.0323232323 0.0323232323 0.0092535892     1     1    0  272  1.0  1.5
3  #16314B  454   454  1.75 0.0539512775 0.0539512775 0.0154453290     1     1    0  454  1.5  2.0
4  #17344F  668   668  2.25 0.0793820559 0.0793820559 0.0227257263     1     1    0  668  2.0  2.5
5  #1B3A58 1133  1133  2.75 0.1346405229 0.1346405229 0.0385452813     1     1    0 1133  2.5  3.0

plot + scale_y_continuous(expand = c(0,0),
         limits=c(0,max(ggplot_build(plot)$data[[1]]$count)*1.1))

enter image description here

Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • ggplot_build(plot)$data[[1]] gives me very different output - [,1] [1,] List,12 – moadeep Jan 29 '13 at 14:15
  • @moadeep Please update your question with sample data, so to be able reproduce your situation. Also this plot was made ggplot2 version 0.9.3 – Didzis Elferts Jan 29 '13 at 16:12
  • The above is the output I got from the commands you typed in your answer – moadeep Jan 29 '13 at 16:17
  • @moadeep What is your version of ggplot2? – Didzis Elferts Jan 29 '13 at 16:18
  • I had an older ggplot2 version. Thanks for your elegant solution – moadeep Jan 30 '13 at 10:34
  • Note that this method will also work for weighted histograms where the normal `hist()`-based solutions will not. GGplot weighted histograms use one variable for the x-axis/binning, and another for the counts. For example: qplot(wt, data=mtcars, geom="freqpoly", weight=mpg) – zach Feb 18 '13 at 16:42
2

Personally, I find the 'hist' function to be the most useful for these sorts of calculations. The 'hist' function is super fast and can provide your frequency counts. For your case, you could do something like this:

max(hist(data$myo_activity, breaks=seq(range_Min, range_Max, by=bin_Width), plot=FALSE)$counts)

Where range_Min is the bottom of your theoretical range (i.e. 0), and range_Max is the upper limit above your theoretically range. bin_Width is the value width of each frequency count.

The equation should give you the max value you need to specify the plot range. I believe the 'ggplot' function is calling the 'hist' function anyway, so I prefer to call it directly when I'm only wanting the data.

Dinre
  • 4,196
  • 17
  • 26