31

ggplot generally does a good job of creating sensible breaks and labels in scales.

However, I find that in plot with many facets and perhaps a formatter= statement, the labels tend to get too "dense" and overprint, for example in this picture:

df <- data.frame(
        fac=rep(LETTERS[1:10], 100),
        x=rnorm(1000)
)

ggplot(df, aes(x=x)) + 
  geom_bar(binwidth=0.5) + 
  facet_grid(~fac) + 
  scale_x_continuous(formatter="percent")

enter image description here

I know that I can specify the breaks and labels of scales explicitly, by providing breaks= and scale= arguments to scale_x_continuous.

However, I am processing survey data with many questions and a dozen crossbreaks, so need to find a way to do this automatically.

Is there a way of telling ggplot to calculate breaks and labels automatically, but just have fewer, say at the minimum, maximum and zero point?

EDIT: Ideally, I don't want to specify the minimum and maximum points, but somehow tap into the built-in ggplot training of scales, and use the default calculated scale limits.

Andrie
  • 176,377
  • 47
  • 447
  • 496

2 Answers2

27

You can pass in arguments such as min() and max() in your call to ggplot to dynamically specify the breaks. It sounds like you are going to be applying this across a wide variety of data so you may want to consider generalizing this into a function and messing with the formatting, but this approach should work:

ggplot(df, aes(x=x)) + 
  geom_bar(binwidth=0.5) + 
  facet_grid(~fac) + 
  scale_x_continuous(breaks = c(min(df$x), 0, max(df$x))
    , labels = c(paste( 100 * round(min(df$x),2), "%", sep = ""), paste(0, "%", sep = ""), paste( 100 * round(max(df$x),2), "%", sep = ""))
    )

or rotate the x-axis text with opts(axis.text.x = theme_text(angle = 90, hjust = 0)) to produce something like:

enter image description here

Update

In the latest version of ggplot2 the breaks and labels arguments to scale_x_continuous accept functions, so one can do something like the following:

myBreaks <- function(x){
    breaks <- c(min(x),median(x),max(x))
    names(breaks) <- attr(breaks,"labels")
    breaks
}

ggplot(df, aes(x=x)) + 
  geom_bar(binwidth=0.5) + 
  facet_grid(~fac) + 
  scale_x_continuous(breaks = myBreaks,labels = percent_format()) + 
  opts(axis.text.x = theme_text(angle = 90, hjust = 1,size = 5))
joran
  • 169,992
  • 32
  • 429
  • 468
Chase
  • 67,710
  • 18
  • 144
  • 161
  • 1
    @Chase Thank you. Yes, I have considered doing this, but it isn't ideal. The reason is that the data could be percentages, respondent counts, t-stat scores, or whatever. Calculating the nearest magnitude might be an option, but really what I want to do is to use the scale that ggplot trained on, and then hide the labels between the end points. In other words, sometime I want the upper end of the scale to be (for example) 60%. I hope this makes sense. – Andrie Mar 21 '11 at 17:30
  • @Andrie - got it. So what you really need here is a function that interprets the type of data shown on the x-axis (percentages, counts, etc...) and modifies the scale accordingly, right? Can you use `class()` on the columns to help inform this? Or some other data/metdata that informs what exactly you are plotting? It shouldn't be too difficult to write a small function to generate the vector of breaks and labels to pass into `scale_x_continuous()` assuming you have some info to inform what and how to format. – Chase Mar 21 '11 at 17:50
  • @Chase I am hoping someone will provide a more generic approach. For example, when working with facets and free scales, e.g. facet_grid(~fac, scales="free"), the high and low break points will in general be different for each facet. So what I am really after is to suppress the labels without specifying the breaks. – Andrie Mar 21 '11 at 20:49
  • 3
    @Andrie maybe you can provide an updated set of sample data that better illustrates your problem? From what I can tell, you have atleast two different issues. 1. Overplotting of the scale axis, 2. using the same code chunk to present the same data in different lights. You could address the overplotting with something like `... + opts(axis.text.x = theme_text(angle = 90, hjust = 0))`. If you want to move beyond formatting issues, I think you are going to have to write your own function to pass parameters to the `labels()` and `breaks()`. – Chase Mar 21 '11 at 21:26
  • +1 for suggesting changing the angle of text and size of text. This will help with my immediate presentational needs. – Andrie Mar 21 '11 at 21:55
  • +1.5 for accept. FYI, Hadley Wickham responded in another forum that what I want to do isn't easy to do at the moment, but in a near future release one will have more control over the breaks and tickmarks. – Andrie Mar 22 '11 at 12:48
5

The scales package contains several breaks_* and label_* functions which return functions (closures) that are used by ggplot. So, you can write a wrappers for these that modify the output.

For example:

library(ggplot2)

# Compute the list of breaks using original_func,
# then remove any of these that occur in remove_list
remove_breaks <- function(original_func, remove_list = list()) {
  function(x) {
    original_result <- original_func(x)
    original_result[!(original_result %in% remove_list)]
  }
}

# Compute the list of labels using original_func,
# then remove any of these that occur in remove_list
remove_labels <- function(original_func, remove_list = list()) {
  function(x) {
    original_result <- original_func(x)
    replace(original_result, original_result %in% remove_list, '')
  }
}

# Original plot
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
  scale_x_continuous(breaks       = scales::breaks_pretty(9),
                     minor_breaks = scales::breaks_pretty(18),
                     labels       = scales::label_number_auto()) +
  scale_y_continuous(breaks       = scales::breaks_pretty(9),
                     minor_breaks = scales::breaks_pretty(18),
                     labels       = scales::label_number_auto())

# Remove some breaks from the x-axis, and remove some labels from the y-axis
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
  scale_x_continuous(breaks       = remove_breaks(scales::breaks_pretty(9), seq(3,6)),
                     minor_breaks = remove_breaks(scales::breaks_pretty(18), seq(3,6,0.5)),
                     labels       = scales::label_number_auto()) +
  scale_y_continuous(breaks       = scales::breaks_pretty(9),
                     minor_breaks = scales::breaks_pretty(18),
                     labels       = remove_labels(scales::label_number_auto(), seq(20, 30)))

Of course, with my simple remove_breaks and remove_labels functions you still have to specify which values to remove, but you can easily modify these to something that removes the max and min value, removes any value in a specified range, etc.

Tim Goodman
  • 23,308
  • 7
  • 64
  • 83