0

I'd like to transform the following ggplot boxplot into a barplot. The information about n, mean, and p should remain. I tried to change geom_boxplot to geom_barplot, which didn't work. Does anyone have an idea how to change it? Thank you very much in advance! I'm grateful for any hints.

enter image description here

p<-ggplot(data_stacked, aes(x = factor, y = attitude, fill = level)) +
  geom_boxplot() +
  scale_fill_manual(values = c("dimgrey", "lightgrey")) +
  stat_compare_means(aes(group = level), label = "p.format", vjust = 5) +
  stat_summary(
    fun.data = function(x)
      data.frame(y= 5, label = paste("mean=",round(mean(
        x
      ), 2))),
    geom = "text",
    aes(group = level),
    hjust = 0.5, vjust = 6.9,
    position = position_dodge(0.9)
  ) +
  stat_summary(
    fun.data = function(x)
      data.frame(y = Inf, label = paste("n=", length(x))),
    geom = "text",
    aes(group = level),
    hjust = 0.5, vjust = 1.2,
    position = position_dodge(0.9)
  )

p +theme_bw(base_size = 12) + ylim("strongly disagree = 1", "disagree = 2", "neither agree nor disagree = 3", "disagree = 4", "strongly agree = 5")



I tried to change geom_boxplot to geom_bar, which didn't work. I tried the ggpubr package, which didn't let me compute both error bars and significance levels without error message

dput(data_stacked)

"high", "high", "high", "high", "high", "high", "high", "high", "high"low", "low", "low", "low", "low", "low", "low", "low", "low", "low", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", low", "low", "low", "low", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "high", "hiigh", "high", "high", "high", "high", "high", "high", "high", "high", "..... row.names = c(NA, -8592L), class = c("tbl_df", "tbl", "data.frame" ))

class(data_stacked)
[1] "tbl_df"     "tbl"        "data.frame"

> head(data_stacked)
  A tibble: 6 × 3
  attitude factor           level
     <dbl> <chr>            <chr>
1        4 public_oversight high 
2        4 public_oversight high 
3        4 public_oversight high 
4        4 authority        high 
5        5 authority        low 
6        4 fairness         high 

I need a barplot similar to this one: enter image description here

acw
  • 1
  • 1
  • Can you provide `dput(data_stacked)` so your example is [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – jrcalabrese Feb 26 '23 at 16:58
  • I don't think a box plot or a bar plot are the best ways to show Likert data. Perhaps a plot like [this](https://stackoverflow.com/questions/70837348/connect-stack-bar-charts-with-multiple-groups-with-lines-or-segments-using-ggplo) would work better for you? In any case, some sample data would certainly improve your chances of a useful answer. – Allan Cameron Feb 26 '23 at 17:31

1 Answers1

0

We don't have your data, but I suppose we can infer its structure from your code and your plot image. If we take the following data frame:

set.seed(1)

data_stacked <- data.frame(attitude = sample(1:5, 8750, TRUE),
                           factor = sample(c("authority", "fairness",
                                             "public_oversight"), 8750, TRUE),
                           level = sample(c("high", "low"), 8750, TRUE))

Then your own code runs to produce this plot:

enter image description here

Instead of this, we can make stacked bars to represent the Likert data as follows:

ggplot(data_stacked, aes(x = level, fill = factor(attitude))) +
  geom_bar(position = position_fill(reverse = TRUE)) +
  stat_compare_means(aes(y = attitude, group = level), label = "p.format",
                     hjust = 0.5, label.y = 1.05, label.x = 1.5) +
  facet_grid(.~factor, switch = "x") +
  scale_fill_grey(labels = c("strongly disagree = 1", 
                             "disagree = 2", 
                             "neither agree nor disagree = 3", 
                             "disagree = 4", 
                             "strongly agree = 5"), name = "Attitude") +
  theme_bw(base_size = 16) +
  scale_y_continuous(labels = scales::percent, breaks = 0:10 / 10) +
  scale_x_discrete(expand = c(0.2, 0.5), position = "top", name = NULL) +
  theme(panel.border = element_blank(),
        axis.line.y = element_line(),
        panel.spacing.x = unit(0, "mm"),
        strip.background = element_blank()) +
  stat_summary(
    fun.data = function(x)
      data.frame(y = 1.1, label = paste("mean=", round(mean(x), 2))),
    geom = "text",
    aes(y = attitude, group = level),
    hjust = 0.5,
    position = position_dodge(0.9)
  ) +
  stat_summary(
    fun.data = function(x)
      data.frame(y = 1.15, label = paste("n=", length(x))),
    geom = "text",
    aes(y = attitude, group = level),
    hjust = 0.5, vjust = 1.2,
    position = position_dodge(0.9)
  ) 

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you very much Allan! This is really helpful. I need to show the significant differences in the means between high and low treatment groups. How can I create a simple barplot for that? Thank you! – acw Feb 26 '23 at 21:01
  • That's what this is @acw - a stacked bar plot with labels showing the n, mean and p value between high and low for the 3 different factors. . I'm not sure what you want to change? – Allan Cameron Feb 26 '23 at 21:13
  • @acw if you mean a "dynamite stick" plot such as the one you show is your example, this is really _not_ a great way to visualise your data (or indeed any data - see [here](http://emdbolker.wikidot.com/blog%3Adynamite) for discussion). For Likert data such as yours, showing all the counts in segments is a great way to demonstrate all the data without making the plot too busy. If you wanted you could easily add a little dot/line range in the middle of each stack to show the mean / standard error, but honestly a dynamite stick plot just seems much worse from a communication ' data viz perspective – Allan Cameron Feb 26 '23 at 21:24
  • Thanks Allan! I agree with you and I'll use the stacked bar plot. How can I add the dot line to show the mean? And how can I move the "high" and "low" labels to the x-axis at the bottom? Thanks so much! – acw Feb 27 '23 at 12:49