0

Main issue: I want to display the data from 0 to 1.0 as an upward bar (starting from 0) but do not want the intervals to be equally spaced but log spaced.

I am trying to display the column labeled "mean" in the dataset below as a bar plot in ggplot but as the numbers are very small, I would like to show the y-axis on a log scale rather than log transform the data itself. In other words, I want to have upright bars with y-axis labels as 0, 1e-8, 1e-6 1e-4 1e-2 and 1e-0 (i.e. from 0 to 1.0 but the intervals are log scaled).

The solution below does not work as the bars are inverted.

> print(df)
        type         mean           sd           se snp
V7    outer 1.596946e-07 2.967432e-06 1.009740e-08   A
V8    outer 7.472417e-07 6.598652e-06 2.245349e-08   B
V9    outer 1.352327e-07 2.515771e-06 8.560512e-09   C
V10   outer 2.307726e-07 3.235821e-06 1.101065e-08   D
V11   outer 4.598375e-06 1.653457e-05 5.626284e-08   E
V12   outer 5.963164e-07 5.372226e-06 1.828028e-08   F
V71  middle 2.035414e-07 3.246161e-06 1.104584e-08   A
V81  middle 9.000131e-07 7.261463e-06 2.470886e-08   B
V91  middle 1.647716e-07 2.875840e-06 9.785733e-09   C
V101 middle 3.290817e-07 3.886779e-06 1.322569e-08   D
V111 middle 6.371170e-06 1.986268e-05 6.758752e-08   E
V121 middle 8.312429e-07 6.329386e-06 2.153725e-08   F

The code below properly generates the grouped barplot with error bars

ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_bar(stat="identity",position=position_dodge(),width=0.5) + 
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45)) 

However, I want to make the y-axis log scaled and so I add in scale_y_log10() as follows:

 ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_bar(stat="identity",position=position_dodge(),width=0.5) + scale_y_log10() +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45)) 

But strangely the bars are falling from above but I simply want them to be going up (as normally) and don't know what I am doing wrong.

Thank you

Lee Sande
  • 457
  • 5
  • 15
  • 2
    barplots are defined in terms of zero. You have very small numbers. The log of very small numbers is negative. The bar goes from zero down to your negative numbers. – Axeman Dec 08 '16 at 20:41
  • I am slightly confused because I am not log transforming the data itself so the numbers are still positive. Moreover, if you plot the data you will see that the y-axis units are still from 1e-6 (bottom) increasing up to 1e-3 but strangely the bars are "falling" from the top to the bottom i.e. from larger numbers to smaller numbers. I just want to view the data on a log scale but not transform the data itself. I hope I am making sense – Lee Sande Dec 08 '16 at 20:52
  • 2
    You are absolutely log transforming the data. `scale_y_log10()` log transforms the data before plotting it. – hrbrmstr Dec 08 '16 at 20:59
  • OK, there there must be a bug in ggplot because the y-tick labels are positive numbers (same values as I see when I don't use the scale_y_log10()). – Lee Sande Dec 08 '16 at 21:00
  • 2
    Nope, it reverse transforms the data for the tick labels. Definitely not a bug. – hrbrmstr Dec 08 '16 at 21:01
  • 1
    I'm also really concerned about you doing this. Bar charts are absolutely supposed to start at 0 but they can't with a log 10 scale (`log10(0)` == `-Inf`) and most folks will make very bad conclusions since they'll be comparing the bars linearly in their heads and will have to constantly remember it's a log scale and try to compensate for it. If the issue is the YUGE difference between E and the other bar pairs you could use faceting with a free Y scale to compensate for that and still make it compact. – hrbrmstr Dec 08 '16 at 21:05
  • From the comments, it appears that I was unclear, so I have added more information and changed the title. I want to display the data from 0 to 1.0 as an upward bar (starting from 0) but do not want the intervals to be equally spaced but log spaced. I hope my explanation is clear now. – Lee Sande Dec 08 '16 at 22:22
  • But as @hrbrmstr already pointed out, `log10(0)` is `-Inf`. So you're asking for a plot in which the bars extend from negative infinity up to the logged values of your data. – eipi10 Dec 09 '16 at 02:06
  • You might find [this SO answer](http://stackoverflow.com/a/9507037/496488) helpful. – eipi10 Dec 09 '16 at 02:21

1 Answers1

4

Here's a bit of hacking to show what happens if you try to get bars that start at zero on a log scale. I've used geom_segment for illustration, so that I can create "bars" (wide line segments, actually) extending over arbitrary ranges. To make this work, I've also had to do all the dodging manually, which is why the x mapping looks weird.

In the example below, the scale goes from y=1e-20 to y=1. The y-axis intervals are log scaled, meaning that the physical distance from, say 1e-20 to 1e-19 is the same as the physical distance from, say, 1e-8 to 1e-7, even though the magnitudes of those intervals differ by a factor of one trillion.

Bars that go down to zero can't be displayed, because zero on the log scale is an infinite distance below the bottom of the graph. We could get closer to zero by, for example, changing 1e-20 to 1e-100 in the code below. But that will just make the already-small physical distances between the data values even smaller and thus even harder to distinguish.

The bars are also misleading in another way, because, as @hrbrmstr pointed out, our brains treat distance along the bar linearly, but the magnitude represented by each increment of distance along the bar changes by a factor of 10 about every few millimeters in the example below. The bars simply aren't encoding meaningful information about the data.

ggplot(data=df, aes(x=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5), 
                    y=mean, colour=type)) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.3) +
  geom_segment(aes(xend=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5),
                   y=1e-20, yend=mean), size=5) +
  scale_y_log10(limits=c(1e-20, 1), breaks=10^(-100:0), expand=c(0,0)) +
  scale_x_continuous(breaks=1:6, labels=LETTERS[1:6])

enter image description here

If you want to stick with a log scale, maybe plotting points would be a better approach:

pd = position=position_dodge(.5)
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se, colour=type), width=.3, position=pd) +
  geom_point(aes(colour=type), position=pd) +
  scale_y_log10(limits=c(1e-7, 1e-5), breaks=10^(-10:0)) +
  annotation_logticks(sides="l")

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285