3

I have data from a questionnaire given before and after an educational module. I am trying to visualise the data with ggplot in a stacked bar chart, more specifically stacked butterfly/tornado graph, where I want to make the two different occasions (before = 1, after =2) act as horisontal facets. I have made a hand-drawn picture of what I want, and also what I currently got as well as my ggplot code.

What I have:

enter image description here

What I need:

enter image description here

My code:

likert_viz <- ggplot(all.gg.data, aes(x = number, y = item, fill = opinion)) +
  geom_col(width = 5.0) +
  geom_vline(xintercept = 50, color = c("white")) +
  scale_y_discrete(limits = all.gg.data$item, labels = scales::wrap_format(50)) +
  scale_x_continuous(labels = scales::percent) +
  labs(title = "Opinion", subtitle = "Subtitle", x = "Percentage", y = "", color = "") +
  facet_wrap(~occasion)

My data: (https://pastebin.com/CdgSseKJ)

I have played around with facets hoping to be able to rearrange them the way I want but failed arranging them so that the "before" answers stacked above the "after" answers for one question at a time. Hoping to get a helpful suggestion or solution.

1 Answers1

3

This was fairly tricky, I found two solutions: create one facet for the positive opinions and one for the negative ones, and removing the space between the facets; explicitly define the start and end for each bar and passing that to geom_segment instead of geom_col.

Common part

Independently from which method you prefer, there are lots of common things:

The first thing you asked was to stack together the bars for different occasions. That can be done by adding position = "dodge" or "dodge2" inside geom_...; Or by creating a new y axis by doing paste(item, "- Occasion", occasion). The first option is more elegant, but i couldn't get it to work.

First step: create the new y axis, and transform number into a frequency (so that the x axis goes form -100% to 100%):

all.gg.data2 = df %>%
  mutate(item2 = paste(item, "- Occasion", occasion)) %>%
  group_by(item, occasion) %>%
  mutate(number = number / sum(number))

Second step: as we want N to be centered in zero, divide that opinion in half, the "positive part", and the "negative part"

all.gg.data.N = all.gg.data2 %>%
  filter(opinion == "N") %>%
  {rbind(mutate(., number = number/2, opinion = "N-"),
         mutate(., number =  number/2, opinion = "N+"))}

Third step: add the new N rows, create the scale variable, and transform the number column depending by what method we want. Fourth step: plotting. Now i divide the explanation by methods.

Obs: both use the custom function (based on scales::wrap_format) that removes extra text from the y axis:

my_wrap_format = function(x) {
  x[seq(2, length(x), 2)] = gsub(".+(- Occasion [1-2])", "\\1", x[seq(2, length(x), 2)])
  
  unlist(lapply(strwrap(x, width = 50, simplify = FALSE), 
                paste0, collapse = "\n"))}

Faceted geom_col method

Third step: change the sign of number for the negative part.

all.gg.data = all.gg.data2 %>%
  filter(opinion != "N") %>%
  rbind(all.gg.data.N) %>%
  mutate(opinion = factor(opinion, levels = c("SD", "D", "N-", "SA", "A", "N+")),
         scale = ifelse(opinion %in% c("SD", "D", "N-"), "-", "+"),
         number = ifelse(scale == "-", -number, number))

Fourth step: create facets based on scale (facet_wrap(vars(scale))), remove extra space in the x axis (expand = c(0, 0) and panel.spacing = unit(0, "cm")), and join "N+" and "N-" with scale_fill_manual.

ggplot(all.gg.data, aes(x = number, y = item2, fill = opinion)) +
  geom_col() +
  facet_wrap(vars(scale), ncol = 2, scales = "free_x") +
  scale_y_discrete(labels = my_wrap_format) +
  scale_x_continuous(labels = scales::percent, expand = c(0, 0)) +
  labs(title = "Opinion", subtitle = "Subtitle", x = "Percentage", y = "", color = "") +
  scale_fill_manual(values = c(SD = "darkblue", D = "blue", `N-` = "grey",
                                `N+` = "grey", A = "red", SA = "darkred"),
                     labels = c("SD", "D", "N", "A", "SA"),
                     breaks = c("SD", "D", "N-", "A", "SA")) +
  theme(panel.spacing = unit(0, "cm"),
        strip.text = element_blank(),
        strip.background = element_blank())

Result: enter image description here

geom_segment method

Third step: define the x and xend aesthetics for geom_segment. xend was created by summing the number values with purrr::accumulate, but in one direction for the positive scale, and the other for the negative. x was created by the lag of xend and a different starting point for each scale.

my_accumulate = function(number, scale) {
  accumulate(number*ifelse(scale == "-", -1, 1), sum, .dir = ifelse(scale[1] == "-", "backward", "forward"))}

all.gg.data = all.gg.data2 %>%
  filter(opinion != "N") %>%
  rbind(all.gg.data.N) %>%
  mutate(opinion = factor(opinion, levels = c("SD", "D", "N-", "N+", "A", "SA")),
         scale = ifelse(opinion %in% c("SD", "D", "N-"), "-", "+")) %>%
  arrange(item, occasion, opinion) %>%
  group_by(item, occasion, scale) %>%
  mutate(number = my_accumulate(number, scale),
         numberStart = if(scale[1] == "-") {c(stats::lag(number, -1)[-1], 0)} else {c(0, stats::lag(number)[-3])})

Obs: you can think that explicitly defining the start and ending points for each bar was automatically done by geom_col, that's why this method is larger.

Fourth step: join "N+" and "N-" with scale_color_manual.

ggplot(all.gg.data, aes(x = numberStart, xend = number,
                        y = item2, yend = item2, color = opinion)) +
  geom_segment(size = 2) +
  scale_y_discrete(labels = my_wrap_format) +
  scale_x_continuous(labels = scales::percent) +
  labs(title = "Opinion", subtitle = "Subtitle", x = "Percentage", y = "", color = "") +
  scale_color_manual(values = c(SD = "darkblue", D = "blue", `N-` = "grey",
                       `N+` = "grey", A = "red", SA = "darkred"),
                     labels = c("SD", "D", "N", "A", "SA"),
                     breaks = c("SD", "D", "N-", "A", "SA"))

Result: enter image description here

Appendix: full code

geom_segment method:

my_wrap_format = function(x) {
  x[seq(2, length(x), 2)] = gsub(".+(- Occasion [1-2])", "\\1", x[seq(2, length(x), 2)])
  
  unlist(lapply(strwrap(x, width = 50, simplify = FALSE), 
                paste0, collapse = "\n"))}

my_accumulate = function(number, scale) {
  accumulate(number*ifelse(scale == "-", -1, 1), sum, .dir = ifelse(scale[1] == "-", "backward", "forward"))}

all.gg.data2 = df %>%
  mutate(item2 = paste(item, "\n- Occasion", occasion)) %>%
  group_by(item, occasion) %>%
  mutate(number = number / sum(number))

all.gg.data.N = all.gg.data2 %>%
  filter(opinion == "N") %>%
  {rbind(mutate(., number = number/2, opinion = "N-"),
         mutate(., number =  number/2, opinion = "N+"))}

all.gg.data = all.gg.data2 %>%
  filter(opinion != "N") %>%
  rbind(all.gg.data.N) %>%
  mutate(opinion = factor(opinion, levels = c("SD", "D", "N-", "N+", "A", "SA")),
         scale = ifelse(opinion %in% c("SD", "D", "N-"), "-", "+")) %>%
  arrange(item, occasion, opinion) %>%
  group_by(item, occasion, scale) %>%
  mutate(number = my_accumulate(number, scale),
         numberStart = if(scale[1] == "-") {c(stats::lag(number, -1)[-1], 0)} else {c(0, stats::lag(number)[-3])})

ggplot(all.gg.data, aes(x = numberStart, xend = number,
                        y = item2, yend = item2, color = opinion)) +
  geom_segment(size = 10) +
  scale_y_discrete(labels = my_wrap_format) +
  scale_x_continuous(labels = scales::percent) +
  labs(title = "Opinion", subtitle = "Subtitle", x = "Percentage", y = "", color = "") +
  scale_color_manual(values = c(SD = "darkblue", D = "blue", `N-` = "grey",
                       `N+` = "grey", A = "red", SA = "darkred"),
                     labels = c("SD", "D", "N", "A", "SA"),
                     breaks = c("SD", "D", "N-", "A", "SA"))

Faceted geom_col method:

all.gg.data2 = df %>%
  mutate(item2 = paste(item, "\n- Occasion", occasion)) %>%
  group_by(item, occasion) %>%
  mutate(number = number / sum(number))

all.gg.data.N = all.gg.data2 %>%
  filter(opinion == "N") %>%
  {rbind(mutate(., number = number/2, opinion = "N-"),
         mutate(., number =  number/2, opinion = "N+"))}

all.gg.data = all.gg.data2 %>%
  filter(opinion != "N") %>%
  rbind(all.gg.data.N) %>%
  mutate(opinion = factor(opinion, levels = c("SD", "D", "N-", "SA", "A", "N+")),
         scale = ifelse(opinion %in% c("SD", "D", "N-"), "-", "+"),
         number = ifelse(scale == "-", -number, number))

ggplot(all.gg.data, aes(x = number, y = item2, fill = opinion)) +
  geom_col() +
  facet_wrap(vars(scale), ncol = 2, scales = "free_x") +
  scale_y_discrete(labels = my_wrap_format) +
  scale_x_continuous(labels = scales::percent, expand = c(0, 0)) +
  labs(title = "Opinion", subtitle = "Subtitle", x = "Percentage", y = "", color = "") +
  scale_fill_manual(values = c(SD = "darkblue", D = "blue", `N-` = "grey",
                                `N+` = "grey", A = "red", SA = "darkred"),
                     labels = c("SD", "D", "N", "A", "SA"),
                     breaks = c("SD", "D", "N-", "A", "SA")) +
  theme(panel.spacing = unit(0, "cm"), panel.border = element_blank(),
        strip.text = element_blank(), strip.background = element_blank())
  • 1
    Thank you for taking time to answer! It almost solved my problem, but not quite, so I will use your solution when I in revisit this with the purpose of doing this entirely in Posit/R Studio. Unfortunately, I was out of time and had to do the plot with Excel. – Nervous Optic Head Jan 04 '23 at 10:01