0

I am a new user of r (and stackoverflow, excuse my formatting in advance) and am having trouble making a barplot (using ggplot, tidyverse package).

I need to make a bar plot with multiple columns on the x-axis and two subgroups, and a mean value on the y-axis.

My data looks like the following

# A tibble: 6 x 5
     Id Baseline  Tyr1  Tyr2 Time 
  <dbl>    <dbl> <dbl> <dbl> <chr>
1     1    0.536 0.172 0.141 pre  
2     2    0.428 0.046 0.084 post 
3     3    0.077 0.015 0.063 pre  
4     4    0.2   0.052 0.041 post 
5     5    0.161 0.058 0.039 pre  
6     6    0.219 0.059 0.05  post  

I want to plot a bar graph with x-axis = Baseline, Tyr1, Tyr2 with subgroups of Time, and y-axis = means. I believe I can use the fill function to make the subgroups, however, I can't find a way to get all my columns on the x-axis.

Goal is to make it look something like the following (I am not sure if the picture is getting uploaded?):

danlooo
  • 10,067
  • 2
  • 8
  • 22
desseper
  • 3
  • 2

2 Answers2

0

The best way to achieve this is to reshape your dataset long, and then use position_dodge to separate the bars corresponding to different times. So:

library(ggplot2)
library(tidyr)
dat %>% 
  pivot_longer(cols=-c(Id,Time)) %>% 
  ggplot(aes(x=name, y=value, fill=Time, group=Time)) + 
    stat_summary(geom="col",width=0.8,position=position_dodge()) + 
    stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8))

enter image description here

Consider also adding the data points for extra transparency. It will be easier for your readers to understand the data and judge its meaning if they can see the individual points:

dat %>% 
  pivot_longer(cols=-c(Id,Time)) %>% 
  ggplot(aes(x=name, y=value, fill=Time, group=Time)) + 
    stat_summary(geom="col",width=0.8,position=position_dodge()) + 
    stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8)) + 
    geom_point(position = position_dodge(width=0.8))

enter image description here

In response to the comment requesting re-ordering of factors and specifying colours, I have added a line to transform Time to a factor while specifying the order, specifying the limits for the x-axis to set the order of the groups, and using scale_fill_manual to set the colours for the bars.

dat %>% 
  mutate(Time = factor(Time, levels=c("pre", "post"))) %>% 
  pivot_longer(cols=-c(Id,Time)) %>% 
  ggplot(aes(x=name, y=value, fill=Time, group=Time)) + 
  stat_summary(geom="col",width=0.8,position=position_dodge()) + 
  stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8)) + 
  geom_point(position = position_dodge(width=0.8)) + 
  scale_x_discrete(limits=c("Tyr1", "Tyr2", "Baseline")) + 
  scale_fill_manual(values = c(`pre`="lightgrey", post="darkgrey"))

enter image description here

To understand what is happening with the reshape, the intermediate dataset looks like this:

> dat %>% 
+   pivot_longer(cols=-c(Id,Time)) 
# A tibble: 18 x 4
      Id Time  name     value
   <int> <chr> <chr>    <dbl>
 1     1 pre   Baseline 0.536
 2     1 pre   Tyr1     0.172
 3     1 pre   Tyr2     0.141
 4     2 post  Baseline 0.428
 5     2 post  Tyr1     0.046
 6     2 post  Tyr2     0.084
 7     3 pre   Baseline 0.077
 8     3 pre   Tyr1     0.015
 9     3 pre   Tyr2     0.063
10     4 post  Baseline 0.2  
11     4 post  Tyr1     0.052
12     4 post  Tyr2     0.041
13     5 pre   Baseline 0.161
14     5 pre   Tyr1     0.058
15     5 pre   Tyr2     0.039
16     6 post  Baseline 0.219
17     6 post  Tyr1     0.059
18     6 post  Tyr2     0.05
George Savva
  • 4,152
  • 1
  • 7
  • 21
  • Great answer, this makes sense with the reshaping of the dataset. Thank you for including the reshaped dataset. Cheers :-) – desseper Apr 08 '22 at 09:16
  • No problem. Although I answered the question as asked I should say that this kind of graph isn't really good practice, if you can it is better to show the individual data points as well as the summaries perhaps by adding a `geom_point()` element to code as well. – George Savva Apr 08 '22 at 09:18
  • I will consider adding the points. Is this to clarify the spread/variance of the data further than the line does - and does this error bar in this case refer to the standard error of the mean? – desseper Apr 08 '22 at 09:29
  • Yes, the default behaviour of `errorbar` in this case is mean+-standard error. Adding the points is a more transparent/honest way of showing the data. – George Savva Apr 08 '22 at 09:35
  • I have added the graph with the points. – George Savva Apr 08 '22 at 09:39
  • First of all thank you for your help so far. I a few questions regarding this graph I'm having trouble with: how can I manually reposition the bars on the x-axis (right now it is alphabetical)? Also in the group if I would prefer to have the "pre" group shown before the "post" group, right now this is also alphabetical. And last question: how do I change the colour of the fill manually, say to grey, dark grey, while still having two different colours? – desseper Apr 11 '22 at 10:42
  • I have updated the anwser – George Savva Apr 11 '22 at 11:53
0

First gather the baseline, tyr1 & tyr2 columns into key/value pairs (and calculate the mean by group):

long <- gather(df, key, value, -id, -time) %>%
  group_by(key, time) %>%
  summarise(value = mean(value))

Then plot using time as the group, and dodging the columns:

ggplot(long, aes(x = key, y = value, group = time, fill = time)) + 
  geom_col(position="dodge") +
  labs(y = "mean")

Results in:

plot

nogbad
  • 435
  • 4
  • 15
  • your graph isn't showing the group means, i think it is overlaying the values to effectively show the maximum in each group. – George Savva Apr 08 '22 at 09:02
  • That is true, I've changed it – nogbad Apr 08 '22 at 09:10
  • Thank you for your answer! Can you explain what the "%>%" part does? I have not met this in r before and see that it is used in both approaches to reshaping the data – desseper Apr 08 '22 at 09:20
  • its from dplyr (which i should have specified, sorry). It pipes the thing on the left hand side of %>% to the right hand side. See this question: https://stackoverflow.com/questions/24536154/what-does-mean-in-r – nogbad Apr 08 '22 at 09:25