63

I have the following problem: I would like to visualize a discrete and a continuous variable on a boxplot in which the latter has a few extreme high values. This makes the boxplot meaningless (the points and even the "body" of the chart is too small), that is why I would like to show this on a log10 scale. I am aware that I could leave out the extreme values from the visualization, but I am not intended to.

Let's see a simple example with diamonds data:

m <- ggplot(diamonds, aes(y = price, x = color))

alt text

The problem is not serious here, but I hope you could imagine why I would like to see the values at a log10 scale. Let's try it:

m + geom_boxplot() + coord_trans(y = "log10")

alt text

As you can see the y axis is log10 scaled and looks fine but there is a problem with the x axis, which makes the plot very strange.

The problem do not occur with scale_log, but this is not an option for me, as I cannot use a custom formatter this way. E.g.:

m + geom_boxplot() + scale_y_log10() 

alt text

My question: does anyone know a solution to plot the boxplot with log10 scale on y axis which labels could be freely formatted with a formatter function like in this thread?


Editing the question to help answerers based on answers and comments:

What I am really after: one log10 transformed axis (y) with not scientific labels. I would like to label it like dollar (formatter=dollar) or any custom format.

If I try @hadley's suggestion I get the following warnings:

> m + geom_boxplot() + scale_y_log10(formatter=dollar)
Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

With an unchanged y axis labels:

alt text

Community
  • 1
  • 1
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • 2
    That's a bug in `coord_trans` - but you can specify custom labels to `scale_y_log10`... – hadley Jan 15 '11 at 14:34
  • Thank you @hadley, I should miss something but eg. `+ scale_y_continous(formatter=dollar)` just do not work. I cannot see the result of any formatter given and I also get three `In max(x) : no non-missing arguments to max; returning -Inf` warnings messages. – daroczig Jan 15 '11 at 16:33
  • @daroxzig: The examples I have seen for the formatter argument have all involved quoted names, so perhaps `formatter="dollar"`? – IRTFM Jan 15 '11 at 16:56
  • @DWin: I tried with quotes also, but the result is exactly the same. – daroczig Jan 15 '11 at 17:00
  • 2
    Formatter doesn't work (yet) but you can still set the labels manually... – hadley Jan 15 '11 at 17:42
  • @hadley: I will look after this (**manual/vustom labels**) also. Now, it looks like that data transformation and a `scale_y_continuous` formatter solved the problem. Thanks! – daroczig Jan 15 '11 at 17:56

4 Answers4

48

The simplest is to just give the 'trans' (formerly 'formatter') argument of either the scale_x_continuous or the scale_y_continuous the name of the desired log function:

library(ggplot2)  # which formerly required pkg:plyr
m + geom_boxplot() + scale_y_continuous(trans='log10')

EDIT: Or if you don't like that, then either of these appears to give different but useful results:

m <- ggplot(diamonds, aes(y = price, x = color), log="y")
m + geom_boxplot() 
m <- ggplot(diamonds, aes(y = price, x = color), log10="y")
m + geom_boxplot()

EDIT2 & 3: Further experiments (after discarding the one that attempted successfully to put "$" signs in front of logged values):

# Need a function that accepts an x argument
# wrap desired formatting around numeric result
fmtExpLg10 <- function(x) paste(plyr::round_any(10^x/1000, 0.01) , "K $", sep="")

ggplot(diamonds, aes(color, log10(price))) + 
  geom_boxplot() + 
  scale_y_continuous("Price, log10-scaling", trans = fmtExpLg10)

alt text

Note added mid 2017 in comment about package syntax change:

scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you @DWin, but this is not the one I was looking for. This way the y axis' labels will be converted to log10, but the axis will not be transformed. What I would like to get: one transformed axis (y) with not scientific labels. – daroczig Jan 15 '11 at 16:37
  • @daroczig: See if this is more satisfactory. I would have sworn that the first time I ran my first solution that I got even powers of ten but I cannot reproduce. Maybe I was so focused on seeing the x-positions that I overlooked the obvious problems – IRTFM Jan 15 '11 at 16:52
  • Thank you @DWin, I just tested your proposals, but as I can see both commands give back the same: the first image I attached to my question. What I would like to get: the last plots in my question (no. 3 and 4, as they are the same) with customizable label formatting. – daroczig Jan 15 '11 at 16:59
  • 1
    @daroczig: The "successful experiment" with "dollarizing" used `fmtLg10dlr <- function(x) dollar(log10(x)); m + geom_boxplot() + scale_y_continuous(formatter='fmtLg10dlr')` , but it just looks "wrong" to me. – IRTFM Jan 15 '11 at 17:38
  • 2
    I suspect you're trying to do something like `ggplot(diamonds, aes(color, log10(price))) + geom_boxplot() + scale_y_continuous(formatter = function(x) format(10 ^ x))` - you need to transform the data and back-transform the labels. – hadley Jan 15 '11 at 17:44
  • @DWin and @hadley: thank you both, I just got to the same solution fifteen minutes before, that I have to transform and later retransform the data. See the other answer. Sorry for bothering! – daroczig Jan 15 '11 at 17:48
  • @hadley: Got it. Thks. But shouldn't you fix the ylab, now that it is not logged values at the tick marks? – IRTFM Jan 15 '11 at 17:57
  • @DWin: please update your answer to first transform the data and after apply the formatter function as discussed here in the comments, that I would be able to accept and upvote your answer. Thank you! – daroczig Jan 16 '11 at 10:27
  • @daroczig: I did so and added the fix I was suggesting for scale label. – IRTFM Jan 16 '11 at 15:40
  • Another similar solution, using `sprintf`: `fmtdol<- function(x)sprintf('$%sK',x/1000)` – NiuBiBang Jan 18 '14 at 03:58
  • 9
    `scale_y_continuous(formatter = 'log10')` is now `scale_y_continuous(trans = 'log10')` (ggplot2 v2.2.1) – pat-s Jun 12 '17 at 10:08
  • getting an error `cannot coerce type 'closure' to vector of type 'character'` when using a function – Dima Lituiev Aug 06 '19 at 18:26
  • I now get the same error.The 'scales' package appears to have changed its mechanism for handling transformations. User-defined transformation no longer succeed for a variety of reasons, one of them from a failure with naming and another one from difficulty with scoping. See `help(pac='scales', as.trans)` – IRTFM Aug 06 '19 at 20:15
19

I had a similar problem and this scale worked for me like a charm:

breaks = 10**(1:10)
scale_y_log10(breaks = breaks, labels = comma(breaks))

as you want the intermediate levels, too (10^3.5), you need to tweak the formatting:

breaks = 10**(1:10 * 0.5)
m <- ggplot(diamonds, aes(y = price, x = color)) + geom_boxplot()
m + scale_y_log10(breaks = breaks, labels = comma(breaks, digits = 1))

After executing::

enter image description here

daroczig
  • 28,004
  • 7
  • 90
  • 124
  • I just noticed this [very similar problem](http://stackoverflow.com/questions/2906855/how-to-override-ggplot2s-axis-formatting) has the same solution. – Susanne Oberhauser Feb 09 '11 at 10:47
  • 2
    thank you for pointing my attention to this alternate solution which would be complete with specifying the simple `dollar` formatter or by writing a custom one: `+ scale_y_log10(breaks = breaks, labels = dollar(breaks))` – daroczig Feb 09 '11 at 12:56
10

Another solution using scale_y_log10 with trans_breaks, trans_format and annotation_logticks()

library(ggplot2)

m <- ggplot(diamonds, aes(y = price, x = color))

m + geom_boxplot() +
  scale_y_log10(
    breaks = scales::trans_breaks("log10", function(x) 10^x),
    labels = scales::trans_format("log10", scales::math_format(10^.x))
  ) +
  theme_bw() +
  annotation_logticks(sides = 'lr') +
  theme(panel.grid.minor = element_blank())

Tung
  • 26,371
  • 7
  • 91
  • 115
1

I think I got it at last by doing some manual transformations with the data before visualization:

d <- diamonds
# computing logarithm of prices
d$price <- log10(d$price)

And work out a formatter to later compute 'back' the logarithmic data:

formatBack <- function(x) 10^x 
# or with special formatter (here: "dollar")
formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ') 

And draw the plot with given formatter:

m <- ggplot(d, aes(y = price, x = color))
m + geom_boxplot() + scale_y_continuous(formatter='formatBack')

alt text

Sorry to the community to bother you with a question I could have solved before! The funny part is: I was working hard to make this plot work a month ago but did not succeed. After asking here, I got it.

Anyway, thanks to @DWin for motivation!

daroczig
  • 28,004
  • 7
  • 90
  • 124
  • I think formatter now changed to labels => https://stackoverflow.com/questions/10146109/formatter-argument-in-scale-continuous-throwing-errors-in-r-2-15 – Bernie2436 Apr 24 '19 at 19:25