28

I'd like to label each bar of a histogram with either the number of counts in that bin or the percent of total counts that are in that bin. I'm sure there must be a way to do this, but I haven't been able to find it. This page has a couple of pictures of SAS histograms that do basically what I'm trying to do (but the site doesn't seem to have R versions): http://www.ats.ucla.edu/stat/sas/faq/histogram_anno.htm

If possible, it would also be nice to have the flexibility to put the labels above or somewhere inside the bars, as desired.

I'm trying to do this with the base R plotting facilities, but I'd be interested in methods to do this in ggplot2 and lattice as well.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Do you want something like this(http://stackoverflow.com/a/9185168/707145)? – MYaseen208 Feb 16 '12 at 20:06
  • See this one also http://stackoverflow.com/q/6644997/707145 – MYaseen208 Feb 16 '12 at 20:30
  • Yes. That example is a bar plot, but I'm looking to do something similar with a histogram, though I'd like to have fine control over where the text goes, rather than having to put it directly above each bar. – eipi10 Feb 16 '12 at 20:31

2 Answers2

49

To include the number of counts, you can just set labels=TRUE.

The example below is just slightly adapted from one on the hist() help page:

hist(islands, col="gray", labels = TRUE, ylim=c(0, 45))

enter image description here

Getting percentages is a bit more involved. The only way I know to do that it to directly manipulate the object returned by a call to hist(), as described in a bit more detail in my answer to this similar question:

histPercent <- function(x, ...) {
   H <- hist(x, plot = FALSE)
   H$density <- with(H, 100 * density* diff(breaks)[1])
   labs <- paste(round(H$density), "%", sep="")
   plot(H, freq = FALSE, labels = labs, ylim=c(0, 1.08*max(H$density)),...)
}

histPercent(islands, col="gray")

enter image description here

Community
  • 1
  • 1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 1
    Thanks Josh. Two follow-ups: (1) What if I want to still plot counts, but label the bars with percents? (2) Is there a way to get fine control over the vertical placement of the numbers (e.g., place them inside the bars (top, middle or bottom), just beneath the bars, etc.? Also, is there a way to control the rotation of the text (e.g., rotate 90 deg so large numbers will still fit). – eipi10 Feb 16 '12 at 20:27
  • @eipi10: See my comments for ggplot2 versions and for more control over the place of counts and percentages. – MYaseen208 Feb 16 '12 at 20:31
  • @eipi10 -- The second part of my answer should be enough to get you started with finer control, if you really want to do this. The key bit is that calls to `hist()` return an object which has all of the pieces you need to get the coordinates for labels that you can then place using `text()` or whatever. – Josh O'Brien Feb 16 '12 at 20:38
  • 2
    Great answer, thanks! Is it possible to adjust the size of the text on top of the bars? I have tried `cex`, `cex.axis`, `cex.main` and `cex.sub` but no joy.. – posdef Mar 04 '15 at 13:27
  • How can we fit a probability distribution to this plot? with *hist* we can simply add density with *lines* to fit the data, here it doesn't work, why? – Mohammad Apr 07 '16 at 15:23
6

Adding numbers at the tops of the bars in barplots or histograms distorts the visual interpretation of the bars, even putting the labels inside of the bars near the top creates a fuzzy top effect that makes it harder for the viewer to properly interpret the graph. If the number are of interest then this creates a poorly laid out table, why not just create a proper table.

If you really feel the need to add the numbers then it is better to put them below the bars or along the top margin so that they line up better for easier comparison and don't interfere with the visual interpretation of the graph. Labels can be added to base graphs using the text or mtext functions and the x locations can be found in the return value from the hist function. Heights for plotting can be computed using the grconvertY function.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • 1
    Thanks Greg. I agree with your comments about text placement. See my comment to Josh's answer regarding fine control of text placement. – eipi10 Feb 16 '12 at 20:32
  • I also find that plotting a grid instead of the value numbers is sufficient to give a fair representations of the real values – Vasile Dec 14 '15 at 15:39