1

I have a dataframe of relative bacterial abundances for 152 samples (rows.) I would like to plot a stacked bar plot of the overall abundances for each bacteria group across all samples (e.g. Actinovacteria vs. Bacteroidetes vs. Firmicutes etc..) I would like it colour-coded, with a legend for this as well. Can someone please suggest how to this? My issue is I'm not sure how to get the column totals for plotting in R. Thank you.

row.names       Actinobacteria  Bacteroidetes   Firmicutes  Fusobacteria    Proteobacteria  Verrucomicrobia Other
1   sample1 0.0084246282    0.41627099  0.55475503  0.000000e+00    7.245180e-04    5.391762e-05    1.977092e-02
2   sample2 0.0168571327    0.13298800  0.80289437  3.560112e-05    4.272135e-03    4.238314e-02    5.696180e-04
3   sample3 0.0020299288    0.53813817  0.42367947  3.311006e-02    7.978327e-04    3.534702e-05    2.209189e-03
espop23
  • 77
  • 2
  • 2
  • 9
  • 1
    [Reshape](http://stackoverflow.com/questions/1181060) your data then plot. Here is [an example](http://stackoverflow.com/a/25936383/680068) using ggplot – zx8754 Jul 19 '16 at 08:05

1 Answers1

2

I wasn't clear whether the sample names were the row names in your dataframe, so I simply recreated the data frame putting the sample name in a variable, same as the bacteria names:

Sample Actinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria
1 sample1    0.008424628     0.4162710  0.5547550 0.000000e+00   0.0007245180
2 sample2    0.016857133     0.1329880  0.8028944 3.560112e-05   0.0042721350
3 sample3    0.002029929     0.5381382  0.4236795 3.311006e-02   0.0007978327
  Verrucomicrobia       Other
1    5.391762e-05 0.019770920
2    4.238314e-02 0.000569618
3    3.534702e-05 0.002209189

To reproduce this dataset you can run the following command:

df <- structure(list(Sample = structure(1:3, .Label = c("sample1", 
"sample2", "sample3"), class = "factor"), Actinobacteria = c(0.0084246282, 
0.0168571327, 0.0020299288), Bacteroidetes = c(0.41627099, 0.132988, 
0.53813817), Firmicutes = c(0.55475503, 0.80289437, 0.42367947
), Fusobacteria = c(0, 3.560112e-05, 0.03311006), Proteobacteria = c(0.000724518, 
0.004272135, 0.0007978327), Verrucomicrobia = c(5.391762e-05, 
0.04238314, 3.534702e-05), Other = c(0.01977092, 0.000569618, 
0.002209189)), .Names = c("Sample", "Actinobacteria", "Bacteroidetes", 
"Firmicutes", "Fusobacteria", "Proteobacteria", "Verrucomicrobia", 
"Other"), class = "data.frame", row.names = c("1", "2", "3"))

As @zx8754 suggested, this data frame requires reshaping, i.e., moving from a wide format to a long format. For more info, check this link for a few examples.

If the dataframe above is named df, the following command will reshape it in long format:

library(reshape2)
df_long <- melt(df, id.vars = "Sample", variable.name = "Phyla")

From here we can plot using ggplot:

library(ggplot2)
ggplot(df_long, aes(x = Sample, y = value, fill = Phyla)) + 
    geom_bar(stat = "identity")

which gives:

enter image description here

thepule
  • 1,721
  • 1
  • 12
  • 22
  • Thank you. Is there a way to change it so the legend title says 'Phyla' instead of variable? – espop23 Jul 19 '16 at 09:07
  • Sure, `variable` is simply the name of the column that holds the bacteria name in `df_long`. If you change the name of that column in the data frame the legend title will change accordingly. Alternatively, you can change it directly in the `melt` procedure. I edited the code to add that. – thepule Jul 19 '16 at 09:17
  • 1
    Or you may change the actual legend title without doing anything with the data: http://www.cookbook-r.com/Graphs/Legends_%28ggplot2%29/ – m-dz Jul 19 '16 at 09:30