0

Using the data.frame below, I want to have a bar plot with y axis log transformed.

I got this plot

enter image description here

using this code

ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
  geom_bar(position="dodge", stat="identity",
           width = 0.7,
           size=.9)+
  geom_errorbar(aes(ymin=ymin,ymax=ymax),
                size=.25,   
                width=.07,
                position=position_dodge(.7))+
  theme_bw()

to log transform y axis to show the "low" level in B and D which is close to zero, I used

+scale_y_log10()

which resulted in

enter image description here

Any suggestions how to transform y axis of the first plot?

By the way, some values in my data is close to zero but none of it is zero.

UPDATE

Trying this suggested answer by @computermacgyver

ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
  geom_bar(position="dodge", stat="identity",
           width = 0.7,
           size=.9)+
  scale_y_log10("y",
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)))+
  geom_errorbar(aes(ymin=ymin,ymax=ymax),
                size=.25,   
                width=.07,
                position=position_dodge(.7))+
  theme_bw()

I got

enter image description here

DATA

dput(df)
structure(list(id = structure(c(7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 
2L, 6L, 6L, 6L, 5L, 5L, 5L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A", 
"B", "C", "D", "E", "F", "G"), class = "factor"), var = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L), .Label = c("high", "medium", "low"), class = "factor"), 
    ymin = c(0.189863418, 0.19131948, 0.117720496, 0.255852069, 
    0.139624146, 0.048182771, 0.056593774, 0.037262727, 0.001156667, 
    0.024461299, 0.026203592, 0.031913077, 0.040168571, 0.035235902, 
    0.019156667, 0.04172913, 0.03591233, 0.026405094, 0.019256055, 
    0.011310755, 0.000412414), ymax = c(0.268973856, 0.219709677, 
    0.158936508, 0.343307692, 0.205225352, 0.068857143, 0.06059596, 
    0.047296296, 0.002559633, 0.032446541, 0.029476821, 0.0394, 
    0.048959184, 0.046833333, 0.047666667, 0.044269231, 0.051, 
    0.029181818, 0.03052381, 0.026892857, 0.001511628), ymean = c(0.231733739333333, 
    0.204891473333333, 0.140787890333333, 0.295301559666667, 
    0.173604191666667, 0.057967681, 0.058076578, 0.043017856, 
    0.00141152033333333, 0.0274970166666667, 0.0273799226666667, 
    0.0357511486666667, 0.0442377366666667, 0.0409452846666667, 
    0.0298284603333333, 0.042549019, 0.0407020586666667, 0.0272998796666667, 
    0.023900407, 0.016336106, 0.000488014)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -21L), .Names = c("id", 
"var", "ymin", "ymax", "ymean"))
shiny
  • 3,380
  • 9
  • 42
  • 79
  • what is I symbol in plot 1? – Hardik Gupta Oct 09 '17 at 05:15
  • @Hardikgupta could you please clarify which I you mean in plot? – shiny Oct 09 '17 at 05:22
  • the 'I' symbol above every bar below the star in plot1 – Hardik Gupta Oct 09 '17 at 05:35
  • See answer by [@computermacgyver](https://stackoverflow.com/a/18526649/680068) – zx8754 Oct 09 '17 at 07:28
  • @zx8754 Many thanks for your time and help. I tried the answer you suggested https://stackoverflow.com/a/18526649/5420677. However, it gave me upside down plot. Please, check the edit. – shiny Oct 09 '17 at 07:54
  • @Hardikgupta Thanks. These are error bars – shiny Oct 09 '17 at 07:55
  • So you don't want log transformation, but only want to display yaxis labels as `10^n`? – zx8754 Oct 09 '17 at 08:03
  • 1
    @zx8754 I need to show the levels that are close to zero through log transformation. In my case, in B and D variables, I want to show the "low" level which is so close to zero. Please, check the difference between plot2 and plot3 and how "low" level in B and D is not close to zero anymore in plot3 but the orientation changed. – shiny Oct 09 '17 at 08:07

3 Answers3

2

As @Miff has written bars are generally not useful on a log scale. With barplots, we compare the height of the bars to one another. To do this, we need a fixed point from which to compare, usually 0, but log(0) is negative infinity.

So, I would strongly suggest that you consider using geom_point() instead of geom_bar(). I.e.,

ggplot(df, aes(x=id, y=ymean , color=var)) +
  geom_point(position=position_dodge(.7))+
  scale_y_log10("y",
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)))+
  geom_errorbar(aes(ymin=ymin,ymax=ymax),
                size=.25,   
                width=.07,
                position=position_dodge(.7))+
  theme_bw()

dot plots are better than bars with log scale

If you really, really want bars, then you should use geom_rect instead of geom_bar and set your own baseline. That is, the baseline for geom_bar is zero but you will have to invent a new baseline in a log scale. Your Plot 1 seems to use 10^-7.

This can be accomplished with the following, but again, I consider this a really bad idea.

ggplot(df, aes(xmin=as.numeric(id)-.4,xmax=as.numeric(id)+.4, x=id, ymin=10E-7, ymax=ymean, fill=var)) +
  geom_rect(position=position_dodge(.8))+
  scale_y_log10("y",
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)))+
  geom_errorbar(aes(ymin=ymin,ymax=ymax),
                size=.25,   
                width=.07,
                position=position_dodge(.8))+
  theme_bw()

Really bad idea of how to have a barplot with a log scale

computermacgyver
  • 802
  • 7
  • 15
1

If you need bars flipped, maybe calculate your own log10(y), see example:

library(ggplot2)
library(dplyr)

# make your own log10
dfPlot <- df %>% 
  mutate(ymin = -log10(ymin),
         ymax = -log10(ymax),
         ymean = -log10(ymean))

# then plot
ggplot(dfPlot, aes(x = id, y = ymean, fill = var, group = var)) +
  geom_bar(position = "dodge", stat = "identity",
           width = 0.7,
           size = 0.9)+
  geom_errorbar(aes(ymin = ymin, ymax = ymax),
                size = 0.25,   
                width = 0.07,
                position = position_dodge(0.7)) +
  scale_y_continuous(name = expression(-log[10](italic(ymean)))) + 
  theme_bw() 

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Many thanks for your time and help. Please,check plot2 vs plot3 and your answer. Plot2 is the values without any transformation and "low" level, the blue, bar is the lowest bar for A, B, C, D, E, and G. However, in plot3 and your answer it became the highest bar. I wonder why? – shiny Oct 09 '17 at 08:44
  • @aelwan Because this is what `-log10` does. Try: `-log10(10000); -log10(0.0001)` – zx8754 Oct 09 '17 at 08:54
1

Firstly, don't do it! The help file from ?geom_bar says:

A bar chart uses height to represent a value, and so the base of the bar must always be shown to produce a valid visual comparison. Naomi Robbins has a nice article on this topic. This is why it doesn't make sense to use a log-scaled y axis with a bar chart.

To give a concrete example, the following is a way of producing the graph you want, but a larger k will also be correct but produce a different plot visually.

k<- 10000  

ggplot(df, aes(x=id, y=ymean*k , fill=var, group=var)) +
  geom_bar(position="dodge", stat="identity",
           width = 0.7,
           size=.9)+
  geom_errorbar(aes(ymin=ymin*k,ymax=ymax*k),
                size=.25,   
                width=.07,
                position=position_dodge(.7))+
  theme_bw() + scale_y_log10(labels=function(x)x/k)

k=1e4

Plot when k=1e4

k=1e6

enter image description here

Miff
  • 7,486
  • 20
  • 20