3

I'm trying to achieve a result like in the image below - such that columns in the graph are aligned based on the join between the middle two categories, rather than at one of the axes (i.e. the line between 'disagree' and 'agree' is in the same X coordinate for each item).

Target image

My code for a toy example is below:

library(ggplot2)
test_dat <- data.frame(question = rep(c('test1', 'test2'), each = 4), 
                       value = rep(c('Strongly disagree', 'Disagree', 'Agree', 'Strongly agree'), 2), 
                       percentage = c(10, 20, 5, 40, 15, 24, 30, 10), 
                       stringsAsFactors = FALSE)

test_dat$value <- factor(test_dat$value, levels = c('Strongly disagree', 'Disagree', 'Agree', 'Strongly agree')[4:1])

ggplot(test_dat, aes(x = question, fill = value, y = percentage)) + 
  geom_col(position = 'stack', width = .7) + 
  coord_flip()

But so far, I can't figure out how to stop it from only aligning on the x axis. I've considered hacking around and making, e.g. a dummy category with a transparent fill, but wondered if there's a route that I'm missing.

My current effort

OTStats
  • 1,820
  • 1
  • 13
  • 22
Sean Murphy
  • 1,217
  • 8
  • 15
  • Ooh, I love these. Heiberger and Robbins call them [diverging stacked barcharts](https://www.jstatsoft.org/article/view/v057i05/v57i05.pdf). In the `HH` package, they have a `likertplot` function that implements these using lattice, which might be either a starting point, or perhaps even good enough. Also see – Aaron left Stack Overflow Oct 08 '19 at 14:22
  • And here's another SO question, using ggplot2. https://stackoverflow.com/q/49161918/210673 – Aaron left Stack Overflow Oct 08 '19 at 14:23
  • I was going to post the same link as @Aaron, but specifically [this answer](https://stackoverflow.com/a/49162153/5325862) to that post might be adaptable – camille Oct 08 '19 at 15:22

3 Answers3

4

Adding the gap between positive and negative categories is actually pretty tricky. To do that, I had to build up shapes from scratch with geom_rect. I followed some of the advice from this answer. One of the problems I ran into was getting the categories to come out in the right order—I kept having "disagree" and "strongly disagree" reversed until I added a "strength" measure to make sure "strongly agree" and "strongly disagree" would both be placed at the extremes.

The main variation was to then add an offset to shift all positive values up by some amount and all negative values down by that same amount. I'd recommend you take the data manipulation steps apart line by line to get the hang of them—I certainly had to just to write it.

library(dplyr)
library(ggplot2)
test_dat <- data.frame(question = rep(c('test1', 'test2'), each = 4), 
                       value = rep(c('Strongly disagree', 'Disagree', 'Agree', 'Strongly agree'), 2), 
                       percentage = c(10, 20, 5, 40, 15, 24, 30, 10), 
                       stringsAsFactors = FALSE)

test_dat$value <- factor(test_dat$value, levels = c('Strongly disagree', 'Disagree', 'Agree', 'Strongly agree')[4:1])

gap <- 0.5
width <- 0.35
test_likert <- test_dat %>%
  mutate(question = forcats::as_factor(question),
         direction = ifelse(grepl("D|disagree", value), -1, 1),
         xmin = as.numeric(question) - width,
         xmax = as.numeric(question) + width,
         strength = as.numeric(grepl("Strongly", value))) %>%
  group_by(question, direction) %>%
  arrange(strength, desc(value)) %>%
  mutate(ymax = cumsum(percentage) + gap,
         ymin = lag(ymax, default = gap)) %>%
  mutate_at(vars(ymin, ymax), ~. * direction)

head(test_likert)
#> # A tibble: 6 x 9
#> # Groups:   question, direction [4]
#>   question value      percentage direction  xmin  xmax strength  ymax  ymin
#>   <fct>    <fct>           <dbl>     <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1 test1    Disagree           20        -1  0.65  1.35        0 -20.5  -0.5
#> 2 test2    Disagree           24        -1  1.65  2.35        0 -24.5  -0.5
#> 3 test1    Agree               5         1  0.65  1.35        0   5.5   0.5
#> 4 test2    Agree              30         1  1.65  2.35        0  30.5   0.5
#> 5 test1    Strongly …         10        -1  0.65  1.35        1 -30.5 -20.5
#> 6 test2    Strongly …         15        -1  1.65  2.35        1 -39.5 -24.5

To get the plot, you now have your x & y positions for geom_rect. The x-scale is a little awkward in order to get text labels (geom_rect needs that scale to be continuous as far as I can tell).

Originally I'd left the y-scale alone, but having the gap will be misleading to readers (@MatiasAndina mentions the readability issue). You'd be placing bars ending at e.g. 30.5 where their values should actually be 30. One way to handle that is to manually set the scale breaks and label them with the offset taken out. That then puts two values labeled as 0, which is weird, but you do want a clear baseline position.

ggplot(test_likert, aes(fill = value)) +
  geom_rect(aes(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax)) +
  coord_flip() +
  scale_x_continuous(labels = levels(test_likert$question), 
                     breaks = unique(as.numeric(test_likert$question))) +
  scale_y_continuous(labels = c(seq(45, 0, by = -15), seq(0, 45, by = 15)), 
                     breaks = c(seq(-45, 0, by = 15) - gap, seq(0, 45, by = 15) + gap),
                     limits = c(-48, 48))

A better way to do the y-scale (and all around more legible for a stacked bar chart), which I'll let you handle, would be to forgo the y-scale breaks and put direct labels on each bar to show their actual values.

camille
  • 16,432
  • 18
  • 38
  • 60
  • This is a great answer, thank you! I'd suspected I might need to use geom_rect but it's one of the few I hadn't played with before. Yes, I intend, as in the example, to add in text labels for the total percentage agree/disagree. I'll update with the final code I come up, based on this, to make it pretty! – Sean Murphy Oct 08 '19 at 22:30
2

One way to align these values is to use zero. If we convert the disagreeing responses to negative values, we should be able to align our responses within a question.

library(tidyverse)

test_dat %>% 
  mutate(percentage = if_else(value %in% c("Strongly disagree", "Disagree"), -1 * percentage, percentage)) %>% 
  ggplot(aes(x = question, fill = value, y = percentage)) + 
  geom_col(position = 'stack', width = .7) + 
  coord_flip()

enter image description here

OTStats
  • 1,820
  • 1
  • 13
  • 22
  • Is there a way to order the bars from Strongly disagree, disagree, agree to Strongly agree? – Dave2e Oct 08 '19 at 14:01
  • 1
    Converting to a factor should to the trick there. Is there any way to get a break between agree and disagree as in Matias' answer without just overwriting the columns with hline (and thus slightly altering the proportions of the columns by obscuring part of them)? – Sean Murphy Oct 08 '19 at 14:21
1

Maybe you are going to have to create your data with positives for "agree" and negatives for "disagree" and have different geom_col() calls for each one of them. This visualization is somewhat difficult to read but that is a different question.

ggplot()+
  geom_col(data = filter(test_dat, value %in% c("Strongly agree", "Agree")),
           aes(question, percentage, fill=value))+
  geom_col(data = filter(test_dat, value %in% c("Strongly disagree", "Disagree")),
           aes(question, -percentage, fill=value))+
  coord_flip()+
  geom_hline(yintercept = 0, lwd=2, color="white")

enter image description here

Update

According to the comment, you can do something like this:

empty_table <- tibble(question = rep(c("test1", "test2"), 2),
                      value = sort(rep(c("empty Agree", "empty Disagree"),2)),
                      percentage = 5)

test_dat <- test_dat %>%
  bind_rows(empty_table)

Modify the percent for whitespace as you wish. You will have to create the factor levels accordingly with mutate(value = factor(value, levels = c(CORRECT LEVELS HERE)))

You will need to set the fill with a scale_fill_manual. For example, if your levels were strongly disagree, ..., empty1, empty2, ... strongly agree.

  scale_fill_manual(values= c("darkred", "red",
 "white", "white",
 "lightblue", "darkblue"))+

Use a theme with white background or change the color accordingly

Matias Andina
  • 4,029
  • 4
  • 26
  • 58
  • This is the right idea, but you can do it with one `geom_col` just by mutating percentage to be negative when desired. That is, `mutate(test_dat, percentage=ifelse(value %in% c("Strongly disagree", "Disagree"), -percentage, percentage))` – Aaron left Stack Overflow Oct 08 '19 at 14:33
  • Yes, I agree, I think both choices are valid, I would rather not overwrite the original data if it's not needed. It ends up being one more `mutate` line or one more `geom_col` line, no? – Matias Andina Oct 08 '19 at 14:37
  • @Aaron Also, OP seems to not like the `hline` biting into the categories but I don't know if another option exists rather than drawing the bars with `geom_rect()` and calculating the positions – Matias Andina Oct 08 '19 at 14:39
  • And I just now notice that OTStats's answer does what I suggest... Sigh. See also that they mutate and then pipe to ggplot so it doesn't overwrite the original. – Aaron left Stack Overflow Oct 08 '19 at 14:42
  • About your question, I'd probably try adding a dummy category with an invisible fill. If you keep the x-axes though, you'd need to set the breaks and labels with the correct offset. – Aaron left Stack Overflow Oct 08 '19 at 14:46
  • @Aaron Yes, see update, I think OP should be able to figure it out from all the answers – Matias Andina Oct 08 '19 at 15:31