0

I have a df that I intend to visualise as a stacked percentage bar plot, with the stacks ordered in descending order. The df contains values in proportion and has been transformed into long format. Below is a reprex with some dummy data that is based on the real data I'm using.

df<- data.frame(ID=c("A","B","C","D","E"),
                a1=c((0.452),(0.558),(0.554),(0.484),(0.661)),
                a2=c((0.326),(0.373),(0.465),(0.434),(0.499)),
                a3=c((0.450),(0.481),(0.613),(0.473),(0.504)),
                a4=c((0.561),(0.681),(0.633),(0.504),(0.723)))

dflong<-df%>%
  pivot_longer(!ID, names_to="aa", values_to="prop")

dflong$ID<-as.factor(dflong$ID)

# A tibble: 15 × 3
   ID    aa     prop
   <fct> <chr> <dbl>
 1 A     a1    0.452
 2 A     a2    0.326
 3 A     a3    0.45 
 4 A     a4    0.561
 5 B     a1    0.558
 6 B     a2    0.373
 7 B     a3    0.481
 8 B     a4    0.681
 9 C     a1    0.554
10 C     a2    0.465
11 C     a3    0.613
12 C     a4    0.633
13 D     a1    0.484
14 D     a2    0.434
15 D     a3    0.473
dflong %>%
  ggplot(aes(x=ID,y=prop, fill=reorder(aa,-prop))) +
  geom_col(position ="fill", data=dflong%>%filter(ID=="A")) +
  geom_col(position ="fill", data=dflong%>%filter(ID=="B")) +
  geom_col(position ="fill", data=dflong%>%filter(ID=="C")) +
  geom_col(position ="fill", data=dflong%>%filter(ID=="D")) +
  geom_col(position ="fill", data=dflong%>%filter(ID=="E")) +
  geom_text(aes(label=scales::percent(prop)),
            position=position_fill(vjust=.5), size=3, colour="black") +
  scale_y_continuous(labels = NULL, breaks = NULL)+
  scale_fill_brewer(palette="GnBu",
                    name="")+
  coord_flip()+
  theme_minimal()+
  theme(legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(caption="",
       x="",
       y="")

My problem is that the resulting plot will always switch between the 2 values within the variable "C", that is the stacked bar chart will exchange the values of C-a1 and C-a3. The image of the plot will demonstrate the issue clearly.

For variable C, a1 should be 55.4% and a3 should be 61.3%

I have tried changing ID into factor, changing aa into factor, reordering the values in the original df, restarting a new session, updating RStudio, and running the code on the R GNU (in case it was an RStudio issue), but nothing I have done so far has fixed this problem. It seems to only affect 1 'variable' which is the C variable, and only for values of a1 and a3. I'm at my wits ends and will appreciate any kind of help, as the console isn't specifying an exact error that's causing this.

EDIT: The reason I have used five separate geom_col functions is to ensure that each columns are stacked horizontally. This was largely influenced by the answer to this question.

faaa96
  • 3
  • 3
  • Any reason you use five geom_col to create your barchart? I mean one `geom_col(position ="fill")` should be sufficient and will fix the issue. – stefan Oct 25 '22 at 07:36
  • Hi! I used 5 separate geom_cols because I wanted to make sure each column is stacked in descending order (ie the smallest proportion goes at the bottom). Using one geom_col argument will fix the issue of switched values, but it means that the columns wont be stacked in a descending order. – faaa96 Oct 25 '22 at 08:31

2 Answers2

1

Using group_by(ID) in the dplyr pipe before ggplot instead of filtering inside ggplot worked, I think. Otherwise the values are sorted ascending and your group C seems to be the only one where the value in a3 is larger than the one in a1.

dflong %>%
  group_by(ID) %>% 
  ggplot(aes(x=ID,y=prop, fill=reorder(aa,-prop))) +
  geom_col(position ="fill") +
  geom_text(aes(label=scales::percent(prop)),
            position=position_fill(vjust=.5), size=3, colour="black") +
  scale_y_continuous(labels = NULL, breaks = NULL)+
  scale_fill_brewer(palette="GnBu",
                    name="")+
  coord_flip()+
  theme_minimal()+
  theme(legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(caption="",
       x="",
       y="")

Kim Ferrari
  • 141
  • 1
  • 5
  • Helo, thank you for your answer! I've tried this code and although it fixes the issue of switched a1 & a3 values, as you said because a3 is larger than a1 in group C, the stacks end up not being in descending order. The reason I've broken down the code into 5 separate `geom_col` is to make sure each column is stacked in descending order. – faaa96 Oct 25 '22 at 08:38
0

One option to achieve your desired result, i.e. ordering the stacks in increasing order of proportion and fixing your issue would be to first create a grouping column which takes care of your desired order and which could be mapped on the group aes. One additional benefit: One geom_col is sufficient when doing it this way.

library(ggplot2)
library(tidyr)
library(forcats)
library(dplyr)

dflong <- df %>%
  pivot_longer(!ID, names_to = "aa", values_to = "prop") %>%
  mutate(ID = factor(ID)) %>%
  # Create grouping column using arrange and fct_inorder
  arrange(ID, desc(prop)) %>%
  mutate(group = fct_inorder(paste(ID, aa, sep = ".")),
         aa = reorder(aa, -prop))

dflong %>%
  ggplot(aes(x = ID, y = prop, fill = aa, group = group)) +
  geom_col(position = "fill") +
  geom_text(aes(label = scales::percent(prop)),
    position = position_fill(vjust = .5), size = 3, colour = "black"
  ) +
  scale_y_continuous(labels = NULL, breaks = NULL) +
  scale_fill_brewer(
    palette = "GnBu",
    name = ""
  ) +
  coord_flip() +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal"
  ) +
  labs(
    caption = "",
    x = "",
    y = ""
  )

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Hi Stefan, you're a lifesaver! This solution worked perfectly. While I had tried to create a grouping column with my desired order, I hadn't thought of mapping it on the `group` aes. Thanks a million for your help. – faaa96 Oct 26 '22 at 04:42