1

There are multiple questions (here for instance) on how to arrange the x axis by frequency in a bar chart with ggplot2. However, my aim is to arrange the categories on the X-axis in a stacked bar chart by the relative frequency of a subset of the fill. For instance, I would like to sort the x-axis by the percentage of category B in variable z.

This was my first try using only ggplot2

library(ggplot2)
library(tibble)
library(scales)

factor1 <- as.factor(c("ABC", "CDA", "XYZ", "YRO"))
factor2 <- as.factor(c("A", "B"))

set.seed(43)
data <- tibble(x = sample(factor1, 1000, replace = TRUE),
               z = sample(factor2, 1000, replace = TRUE))


ggplot(data = data, aes(x = x, fill = z, order = z)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = percent)

When that didn't work I created a summarised data frame using dplyr and then spread the data and sort it by B and then gather it again. But plotting that didn't work either.

library(dplyr)
library(tidyr)
data %>%
  group_by(x, z) %>%
  count() %>%
  spread(z, n) %>%
  arrange(-B) %>%
  gather(z, n, -x) %>%
  ggplot(aes(x = reorder(x, n), y = n, fill = z)) +
  geom_bar(stat = "identity", position = "fill") +
  scale_y_continuous(labels = percent)

I would prefer a solution with ggplot only in order not to be dependent of the order in the data frame created by dplyr/tidyr. However, I'm open for anything.

FilipW
  • 1,412
  • 1
  • 13
  • 25
  • @docendodiscimus I've specified the question, it's primarily the relative frequency that I'm interested in. Thanks! – FilipW Feb 12 '18 at 14:23

2 Answers2

2

If you want to sort by absolute frequency:

lvls <- names(sort(table(data[data$z == "B", "x"])))

If you want to sort by relative frequency:

lvls <- names(sort(tapply(data$z == "B", data$x, mean)))

Then you can create the factor on the fly inside ggplot:

ggplot(data = data, aes(factor(x, levels = lvls), fill = z)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = percent)

FilipW
  • 1,412
  • 1
  • 13
  • 25
talat
  • 68,970
  • 21
  • 126
  • 157
  • I've specified the question, this answer was useful but it is primarily the relative frequency that I want to sort by not the absolute. – FilipW Feb 12 '18 at 14:21
1

A solution using tidyverse would be:

data %>% 
  mutate(x = forcats::fct_reorder(x, as.numeric(z), fun = mean)) %>% 
  ggplot(aes(x, fill = z)) +
    geom_bar(position = "fill") +
    scale_y_continuous(labels = percent)

https://discourse-cdn-sjc1.com/business3/uploads/tidyverse/original/2X/5/588440b7c5a8df6bcb60a35b08cb22fc9f778090.png

FilipW
  • 1,412
  • 1
  • 13
  • 25