37

I have a dataset which looks like this one below. I am trying to make a barplot with the grouping variable gender, with all the variables side by side on the x axis (grouped by gender as filler with different colors), and mean values of variables on the y axis (which basically represents percentages)

tea                coke            beer             water           gender
14.55              26.50793651     22.53968254      40              1
24.92997199        24.50980392     26.05042017      24.50980393     2
23.03732304        30.63063063     25.41827542      20.91377091     1   
225.51781276       24.6064623      24.85501243      50.80645161     1
24.53662842        26.03706973     25.24271845      24.18358341     2   

In the end I want to get a barplot like this enter image description here

any suggestions how to do that? I made some searches but I only find examples for factors on the x axis, not variables grouped by a factor. any help will be appreciated!

roscoe1895
  • 539
  • 1
  • 5
  • 8
  • You need to be clearer. To clarify, you want beverage along the x-axis like valence is in your example. You want the bars coloured by gender like low and high depressive systems are here, and the height of the bars is defined by the mean of each column by gender, correct? – Christie Haskell Marsh Mar 10 '14 at 16:00
  • hey crmhaske. I want the colors to represent gender, and x axis will display variables of beverage, yes! :) – roscoe1895 Mar 10 '14 at 16:10
  • Also, you should go back and delete your other question that you posted asking the same thing. It's better practice to edit an old question with new information than to post a duplicate question. – Christie Haskell Marsh Mar 10 '14 at 16:24

3 Answers3

53

You can use aggregate to calculate the means:

means<-aggregate(df,by=list(df$gender),mean)
Group.1      tea     coke     beer    water gender
1       1 87.70171 27.24834 24.27099 37.24007      1
2       2 24.73330 25.27344 25.64657 24.34669      2

Get rid of the Group.1 column

means<-means[,2:length(means)]

Then you have reformat the data to be in long format:

library(reshape2)
means.long<-melt(means,id.vars="gender")
  gender variable    value
1      1      tea 87.70171
2      2      tea 24.73330
3      1     coke 27.24834
4      2     coke 25.27344
5      1     beer 24.27099
6      2     beer 25.64657
7      1    water 37.24007
8      2    water 24.34669

Finally, you can use ggplot2 to create your plot:

library(ggplot2)
ggplot(means.long,aes(x=variable,y=value,fill=factor(gender)))+
  geom_bar(stat="identity",position="dodge")+
  scale_fill_discrete(name="Gender",
                      breaks=c(1, 2),
                      labels=c("Male", "Female"))+
  xlab("Beverage")+ylab("Mean Percentage")

enter image description here

  • is there a way to change the names of the variables on the code? or should I go back and change the data? :/ – roscoe1895 Mar 10 '14 at 18:17
  • I'm not sure I understand what you mean? If you want any of the labeling on the plot done differently this can be done without modifying the original data file, but you'll need to be more specific. – Christie Haskell Marsh Mar 12 '14 at 16:45
  • 1
    Huge help. Thanks a lot. – Jim G. Oct 07 '16 at 12:12
  • Assume you have just two lists of sums. How can you declare your table structure with colums from start to the end? - - This fact is limiting me from applying your method; I extended it here http://stackoverflow.com/q/40554479/54964 – Léo Léopold Hertz 준영 Nov 11 '16 at 19:51
  • How can I use the geom_text(aes(x=, y=, label=mylabels) option with dodge bars in order to get a label centered on each bar? – skan Sep 26 '17 at 14:43
13

You can plot the means without resorting to external calculations and additional tables using stat_summary(...). In fact, stat_summary(...) was designed for exactly what you are doing.

library(ggplot2)
library(reshape2)            # for melt(...)
gg <- melt(df,id="gender")   # df is your original table
ggplot(gg, aes(x=variable, y=value, fill=factor(gender))) + 
  stat_summary(fun.y=mean, geom="bar",position=position_dodge(1)) + 
  scale_color_discrete("Gender")
  stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
               color="grey80",position=position_dodge(1), width=.2)

To add "error bars" you cna also use stat_summary(...) (here, I'm using the min and max value rather than sd because you have so little data).

ggplot(gg, aes(x=variable, y=value, fill=factor(gender))) + 
  stat_summary(fun.y=mean, geom="bar",position=position_dodge(1)) + 
  stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
               color="grey40",position=position_dodge(1), width=.2) +
  scale_fill_discrete("Gender")

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • How can I use the geom_text(aes(x=, y=, label=mylabels) option with dodge bars in order to get a label centered on each bar? – skan Sep 26 '17 at 14:44
6

Using reshape2 and dplyr. Your data:

df <- read.table(text=
"tea                coke            beer             water           gender
14.55              26.50793651     22.53968254      40              1
24.92997199        24.50980392     26.05042017      24.50980393     2
23.03732304        30.63063063     25.41827542      20.91377091     1   
225.51781276       24.6064623      24.85501243      50.80645161     1
24.53662842        26.03706973     25.24271845      24.18358341     2", header=TRUE)

Getting data into correct form:

library(reshape2)
library(dplyr)
df.melt <- melt(df, id="gender")
bar <- group_by(df.melt, variable, gender)%.%summarise(mean=mean(value))

Plotting:

library(ggplot2)
ggplot(bar, aes(x=variable, y=mean, fill=factor(gender)))+
  geom_bar(position="dodge", stat="identity")

enter image description here

Carlos Cinelli
  • 11,354
  • 9
  • 43
  • 66
  • @Carlos is there a way to do these with different formulas added to the variables? – E B Oct 02 '16 at 04:29