0

Ordering of factor levels in ggplot is a common issue, and there are a number of posts about it (e.g., Avoid ggplot sorting the x-axis while plotting geom_bar()).

This may be a duplicate, but I haven't come across this particular situation.

I'm trying to maintain the order of the X-axis variable ("cylinders") in a stacked bar plot. Here's a toy example. I converted the variable below to emphasize the alphabetic ordering on the X axis even though that variable (cylinders) has explicit ordering set earlier in the dataframe as "Four cyl", "Six cyl", and "Eight cyl".

What am I doing wrong?

mtcars <- mtcars %>% 
  mutate(cylinders = case_when(cyl == 4 ~ "Four cyl",
                               cyl == 6 ~ "Six cyl",
                               cyl == 8 ~ "Eight cyl"),
         cylinders = reorder(cylinders, cyl, mean)) %>% 
  mutate(engine = case_when(vs == 1 ~ "Manual",
                            vs == 0 ~ "Automatic"))

str(mtcars$cylinders)
levels(mtcars$cylinders)  # [1] "Four cyl"  "Six cyl"   "Eight cyl"
class(mtcars$cylinders)

facet_test <- function(df, gathvar) {

  gath <- enquo(gathvar)

  df %>% 
    select(cylinders, !!gath) %>%
    gather(key, value, -!!gath) %>%
    count(!!gath, key, value) %>%
    group_by(value) %>%
    mutate(perc = round(n/sum(n), 2) * 100) %>%  
    ggplot(aes(x = value, y = perc, fill = !!gath)) +
      geom_bar(stat = "identity")
}

facet_test(df = mtcars, gathvar = engine)

enter image description here

Daniel
  • 415
  • 1
  • 6
  • 16
  • Run the internals of that function on your data. After the `gather` line, you have 3 columns: `engine`, `key`, and `value`. `value` is where your cylinder information is, but it isn't a factor, so there's no ordering. But I don't see why you need the `gather` anyway—you could have made this plot without it – camille Jul 12 '18 at 17:57
  • As in, take out the `gather` and go straight to `count`, then use `x = cylinders` in your `aes` – camille Jul 12 '18 at 18:01
  • Thanks. The `gather` is there because this is a truncated example of a longer, more complicated function. Any advice on how to make the `value` column retain the factor information? – Daniel Jul 12 '18 at 18:01
  • Try making `value` a factor and getting the levels by order of appearance in that column (such as using `forcats::fct_inorder`) – camille Jul 12 '18 at 18:02
  • Ok I'll try that. One moment... In the larger function, I'm using facet_wrap, so I have different gathering variables. – Daniel Jul 12 '18 at 18:03
  • Nope, I hit a wall. I'll keep trying - I'm all ears to any suggestions. – Daniel Jul 12 '18 at 18:14
  • I think troubleshooting outside `tidyeval` will make things clearer. As @camille pointed out, after you `gather()` your variable `value` is a character and `ggplot()` then makes it into a factor with default alphanumeric order. If you want a different order you'll have to manually define it. In this specific case you'd want to use `mutate(value = factor(value, levels = levels(mtcars$cylinders)) ) ` after `gather()`. Maybe you need to define an additional variable in your function to be used for setting factor order? – aosmith Jul 12 '18 at 18:40

1 Answers1

0

Thanks to the comments and to @alistaire at this post (https://stackoverflow.com/a/39157585/8453014), I was able to arrive at a solution. The problem is that gather coerces factors into characters.

Simple scenario As @aosmith suggested, use mutate(value = factor(value, levels = levels(mtcars$cylinders)) ) after gather.

Complex example with multiple variables The important aspects are 1) define factor levels (whether inside or outside of the function) and 2) apply levels to the "value" column.

Here's a more complicated example to show using three variables and applying facet_wrap to see the plots side by side:

facet_test <- function(df, gathvar, legend_title) {
  gath <- enquo(gathvar)

# next two lines can go inside or outside of the function
  levels_cyl <- c("Four cyl", "Six cyl", "Eight cyl")
  levels_gears <- c("Three", "Four", "Five")

  df %>% 
    select(cylinders, gears, !!gath) %>%
    gather(key, value, -!!gath) %>%
    count(!!gath, key, value) %>%
    ungroup() %>% 
    mutate(value = factor(value, levels = unique(c(levels_cyl, levels_gears), 
                                                         fromLast = TRUE))) %>% 
    arrange(key, value) %>%  
    ggplot(aes(x = value, y = n, fill = !!gath)) +
      geom_bar(stat = "identity") +
      facet_wrap(~ key, scales = "free_x")
}

facet_test(df = mtcars, gathvar = engine)

[correct plot with factor levels in pre-defined order[1]

Daniel
  • 415
  • 1
  • 6
  • 16