80

I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:

hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))

This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.

Then I've tried doing:

mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")

It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Weegee
  • 2,225
  • 1
  • 17
  • 16
  • Related older question: [Make y-axis logarithmic in histogram using R](https://stackoverflow.com/questions/7828248/make-y-axis-logarithmic-in-histogram-using-r) – smci May 30 '17 at 05:27

7 Answers7

69

A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.

As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:

plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)

gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.

Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Excellent! How can I modify the axis on the bottom though? Rather than showing 1, 2, 3, 4, 5, 6, I'd like to show 0 <= 1, 1 <= 2, etc. – Weegee Aug 07 '09 at 16:14
  • 3
    Suppressing the axis in plot() and explicit call to axis() giving the 'where' and 'what' allows you to do that. – Dirk Eddelbuettel Aug 07 '09 at 16:21
  • Unfortunately "type = 'h' " doesn't seem to work anymore (wow this answer is from nearly 12 years ago!!) – BGranato Apr 12 '21 at 19:54
  • That would surprise me. Base R plot functions should not change. And indeed, this works just fine for me as it should: `set.seed(123); z <- cumsum(runif(100)); plot(z, type='h')`. – Dirk Eddelbuettel Apr 12 '21 at 20:04
56

Another option would be to use the package ggplot2.

ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
pjvandehaar
  • 1,070
  • 1
  • 10
  • 24
Thierry
  • 18,049
  • 5
  • 48
  • 66
  • This is a very good answer and automates a lot of the details that can always be tuned later. Thank you! – Sun Bee Jul 06 '19 at 07:00
11

It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.

hadley
  • 102,019
  • 32
  • 183
  • 245
10

Run the hist() function without making a graph, log-transform the counts, and then draw the figure.

hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)

It should look just like the regular histogram, but the y-axis will be log2 Frequency.

user2596153
  • 101
  • 1
  • 2
  • 4
    To prevent -Inf you'll have to use the following: `hist.data$counts[hist.data$counts>0] <- log(hist.data$counts[hist.data$counts>0], 2)` – kory Mar 22 '17 at 16:40
10

Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:

buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)

The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.

I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.

Quinn Taylor
  • 44,553
  • 16
  • 113
  • 131
3

Here's a pretty ggplot2 solution:

library(ggplot2)
library(scales)  # makes pretty labels on the x-axis

breaks=c(0,1,2,3,4,5,25)

ggplot(mydata,aes(x = V3)) + 
  geom_histogram(breaks = log10(breaks)) + 
  scale_x_log10(
    breaks = breaks,
    labels = scales::trans_format("log10", scales::math_format(10^.x))
  )

Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10

3

I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.

The original problem would be solved with:

myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")

The function:

myhist <- function(x, ..., breaks="Sturges",
                   main = paste("Histogram of", xname),
                   xlab = xname,
                   ylab = "Frequency") {
  xname = paste(deparse(substitute(x), 500), collapse="\n")
  h = hist(x, breaks=breaks, plot=FALSE)
  plot(h$breaks, c(NA,h$counts), type='S', main=main,
       xlab=xlab, ylab=ylab, axes=FALSE, ...)
  axis(1)
  axis(2)
  lines(h$breaks, c(h$counts,NA), type='s')
  lines(h$breaks, c(NA,h$counts), type='h')
  lines(h$breaks, c(h$counts,NA), type='h')
  lines(h$breaks, rep(0,length(h$breaks)), type='S')
  invisible(h)
}

Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.

Alice Purcell
  • 12,622
  • 6
  • 51
  • 57