0

My data includes 6 samples ( as rownames currently) and 24 columns each of which is named after different bacterial species, and the numbers are the relative abundances.

Here is the structure;

dput(sig_speciesstacked) 

structure(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3", "0.32503", "0.55197", "1.23225", "0", "0", "0", "0.11568", "1.27372", "0.04306", "0", "0", "0", "0.78402", "0.99583", "0.03723", "0", "0", "0", "0.07664", "0.0932", "0.28018", "0", "0", "0", "0.29037", "0.74246", "0.3061", "0", "0", "0", "0.22328", "0.40351", "0.00416", "0", "0", "0", "0", "0", "0", "0.23779", "0.70807", "0.00891", "0.04852", "0.34497", "0.19266", "0", "0", "0", "0.26408", "0.05026", "0.0022", "0", "0", "0", "0.31206", "0.59428", "0.15606", "0", "0", "0", "0.13716", "0.55023", "0.4716", "0", "0", "0", "0.27194", "0.57013", "0.23164", "0", "0", "0", "6.84233", "2.18166", "0.6827", "0", "0", "0", "0", "0", "0", "0.94569", "0.0108", "0.06016", "0.32686", "0.04407", "1.02125", "0", "0", "0", "0", "0", "0", "0.51243", "0.10427", "1.48269", "0", "0", "0", "1.49594", "0.90364", "0.0081", "1.27002", "1.80154", "0.33065", "0", "0", "0", "2.40484", "0.36535", "3.79276", "0", "0", "0", "4.23202", "2.63742", "0.37963", "0", "0", "0", "0.38793", "0.81874", "0.04095", "0", "0", "0", "0", "0", "0", "1.04847", "0.08983", "0.02608", "0", "0", "0", "0.14408", "0.1637", "0.07754"), .Dim = c(6L, 24L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("Sample", "Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))

I am trying to make a stacked bar chart showing the different abundances for the different samples ( 3 control and 3 diseases).

First I added a column name to the one containing my sample names, so there were then 25 columns in total. 1st one contains samples, 2:25 contain the abundances of the 24 different species.

 sig_speciesstacked <- cbind(Samples= rownames(sig_speciesstacked), sig_speciesstacked)

 print(colnames(sig_speciesstacked))

 rownames(sig_speciesstacked) <- c("1", "2", "3", "4", "5", "6").  

I already have installed and loaded reshape2. The code I run then is

 sig_speciesstackplot1 <- melt(sig_speciesstacked, id.vars = "Samples", variable.name = "species")

 pdf("Stackedbarplot.species.pdf", width = 6, height = 7)
 ggplot(sig_speciesstackplot1,aes(x=Samples, y=value, fill= species))+ geom_bar(stat = 
 "identity", position="fill")

The error I am met with is Error in FUN(X[[i]], ...) : object 'Samples' not found, then it will be abundance not found, then species not found.

Edit; I understand I have to rename aes(x=, y=) to the col names of sig_speciesstackplot1. However, this is not the correct format of the sig_speciesstackplot1 output following melt?

       Var1    Var2                 value
     1  1   Samples               Control1
     2  2   Samples               Control2
     3  3   Samples               Control3
     4  4   Samples               Disease1
     5  5   Samples               Disease2
     6  6   Samples               Disease3
     7  1   Alistipes_finegoldii    0.32503
     8  2   Alistipes_finegoldii    0.55197
     9  3   Alistipes_finegoldii    1.23225
     10 4   Alistipes_finegoldii    0

And so on, each of the 24 species is repeated 6 times with different abundance levels corresponding to the different samples.

Not sure why Var1 and Var2 were not renamed to "Samples" and "species" respectively from my line of code above, and why the output is like that.

And running the ggplot using aes(x = Var1 etc) gets a plot that is completely wrong.

Edit; For anybody having a similar issue, please do not use cbind. From an example on here, they made column 1 contain the sample names, hence why I used it. If you don't do this and just have the row names as the sample names, it will work fine. Thank you very much to those who helped below!

pemby
  • 3
  • 2
  • 1
    The column names in your `ggplot` call aren't in your data. After melting, you don't have columns named "samples" and "abundance." To begin with, you didn't have a column named "samples," it was "Samples." Case matters, this is just a series of typos – camille Nov 25 '19 at 20:12
  • @camille Please could you clarify? The columns of my sig_speciesstackplot1 after melting are Var1, Var2 and value. I don't understand why there is an error when I am trying to give the x and y axis a title. The fill=species is also not recognized. – pemby Nov 25 '19 at 20:19
  • 1
    The arguments to `aes` are the names of columns in your data frame. If your data frame has columns Var1, Var2, and value, but you tell `aes` that they're samples, abundance, and species, they won't be found because they don't exist. Maybe take a look at the ggplot2 docs and tutorials there, since they're pretty detailed – camille Nov 25 '19 at 20:32
  • @camille Thanks for your reply. I don't fully understand why "melt" would have created columns with those titles. I cannot use those column titles for an x and y axis as it does not make any sense. The x axis should be my patients samples and the y axis should be relative abundance. – pemby Nov 25 '19 at 20:38
  • `melt` automatically picks column names after melting. You can change the column names back with `names(sig_speciesstackplot1) <- c("samples", "species", "abundance")`. – Frederick Nov 25 '19 at 20:43
  • I think it would be good to look through docs for the functions you're using. It's also good to look at the data you get from one function before passing it along to the next. If you look at your data frame before calling `ggplot` on it, you'll see what its names are, and that they're not the names you're giving `aes`. You can change the labels later—again, this is detailed in docs. – camille Nov 25 '19 at 20:54

1 Answers1

1

nice that you are posting here for the first time. I don't know if I understand your question correctly but here is my attempt in solving it.

Please note that my approach uses 'pipes' (%>%) and also the function pivot_longer from the package tidyverse instead of melt.

# load needed packge (includes ggplot2), install first if not installed yet
library("tidyverse")

# putting your data into an object
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))

df_plot <- sig_speciesstacked %>% 
        # maling a data frame from your data
        data.frame() %>% 
        # use the matrix row names (your data) and put them into a column names 'type'
        rownames_to_column(var = "type") %>% 
        # pivot longer instead of melt
        pivot_longer(-type, names_to = "names", values_to = "value")

ggplot(data = df_plot,
       aes(x = names, y = value, group = type, fill = type)) + 
        geom_bar(stat = "identity", position="stack")

Created on 2019-11-25 by the reprex package (v0.3.0)


Update

After clarification and looking at your code again, the solution seems simpler. You were on a good track, only that you didn't use the correct names from you 'melted' data for the plot, as @camille pointed out.

The aesthetics (aes) in ggplot need to refer to the column names in your data (sig_speciesstackplot1). As you saw yourself, these are Var1, Var2, and value.

library("tidyverse")
library(reshape2)
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths

# Your code
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))

sig_speciesstackplot1 <- melt(sig_speciesstacked, id.vars = "Samples", variable.name = "species")

# Correct plot
ggplot(sig_speciesstackplot1,
       aes(x=Var1, y=value, fill= Var2))+ 
        geom_bar(stat = "identity", position="stack") +
        theme(legend.position="bottom")

Created on 2019-11-25 by the reprex package (v0.3.0)


Update 2

If you want them 'stacked' as a percentage, you can use position = "fill" like so:

ggplot(sig_speciesstackplot1,
       aes(x=Var1, y=value, fill= Var2))+ 
        geom_bar(stat = "identity", position="fill") +
        theme(legend.position="bottom")

Created on 2019-11-25 by the reprex package (v0.3.0)


Update 3

After re-examining the OPs code and the comments below I want to share the following.

The OP used reshape2::melt() on a matrix with rownames. This issue is discussed here:Why reshape2's Melt cannot capture rownames in the transformation?

Below, I compare the behaviour of reshape2::melt() for a matrix and a data.frame. The latter one shows the intended behaviour.

# OPs code
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))
sig_speciesstacked <- cbind(Samples= rownames(sig_speciesstacked), sig_speciesstacked)
rownames(sig_speciesstacked) <- c("1", "2", "3", "4", "5", "6")

# Using reshapse2::melt on a matrix
sig_speciesstackplot1 <- reshape2::melt(sig_speciesstacked,
                              id.vars = "Samples", variable.name = "species")
head(sig_speciesstackplot1)
#>   Var1    Var2    value
#> 1    1 Samples Control1
#> 2    2 Samples Control2
#> 3    3 Samples Control3
#> 4    4 Samples Disease1
#> 5    5 Samples Disease2
#> 6    6 Samples Disease3

# Using reshapse2::melt on a data.frame with stringsAsFactors = F
sig_speciesstackplot1 <- reshape2::melt(as.data.frame(sig_speciesstacked,
                                                      stringsAsFactors = F),
                              id.vars = "Samples", variable.name = "species")
head(sig_speciesstackplot1)
#>    Samples              species   value
#> 1 Control1 Alistipes_finegoldii 0.32503
#> 2 Control2 Alistipes_finegoldii 0.55197
#> 3 Control3 Alistipes_finegoldii 1.23225
#> 4 Disease1 Alistipes_finegoldii       0
#> 5 Disease2 Alistipes_finegoldii       0
#> 6 Disease3 Alistipes_finegoldii       0

Created on 2019-11-26 by the reprex package (v0.3.0)

Frederick
  • 810
  • 8
  • 28
  • Thank you for your help. I need to have 6 bars ( 3 for control and 3 for disease), that then need to have colored stacks based on the relative abundance of the different 24 species, so there should be 24 different colours, one for each species. So the colour key and the x axis need to swap, if that makes sense? – pemby Nov 25 '19 at 20:31
  • Thanks for the update. The bars are all meant to be the same height, what should vary would be the proportion that is colored by the different species based on the relative abundance found in that sample. When I look at the sig_speciesstackplot1, the Var1, Var2 and value columns contain mixed up points. For instance the Var2 column has the word "Samples" 6 times going down, then the value column has my Control1, Control2, Control3, Disease1, Disease2, Disease3. Is there a better way to give you my data rather than the dput(head() function? – pemby Nov 25 '19 at 21:00
  • I updated my answer with bar charts of same height. I don't see "Samples" appearing 6 times in `sig_speciesstackplot1$Var2`. – Frederick Nov 25 '19 at 21:09
  • thank-you for your help. It is much appreciated. Please could you double check the output of my sig_speciesstackplot1 which I have posted above? I don't have any examples to go by, when checking whether this is the correct format following melt. – pemby Nov 25 '19 at 21:47
  • I ran through the steps again, got the same sig_speciesstackplot1 output as above. Even when renaming Var1 and Var2 to "Samples" and "Species" respectively, and then using aes(x=Samples, y=value, fill= Species)) I still get an error saying that Samples cannot be found, so I suspect the format following melt is incorrect. – pemby Nov 25 '19 at 22:37
  • Hey @pemby did it work for you? I could reproduce Frederick's plot. Did you check that you use the same sig_speciesstacked as he did with his code? Looking at what you dput(), it's all characters which is really weird. – StupidWolf Nov 25 '19 at 23:40
  • Hey @pemby, I used dput() from Frederick's answer. Using the dput() you provided, I had to manipulate it a bit to get the plot. Can you dput the original sig_speciesstacked, before you did this: cbind(Samples= rownames(sig_speciesstacked), sig_speciesstacked) – StupidWolf Nov 26 '19 at 11:29
  • @StupidWolf, Thank you for your comment. I managed to find my error and will adjust the above, in terms of where I went wrong. I shouldn't have done cbind, as the rownames should be left as the sample names. I was following some steps from another post on here, and it worked for them but not for me in this case. – pemby Nov 26 '19 at 11:46
  • I again updated my answer for the use of `reshape2::melt()` with a `matrix` versus a `data.frame`. The latter one shows the intended behaviour. – Frederick Nov 26 '19 at 12:17
  • hey Frederick, glad it worked in the end :) Great effort. – StupidWolf Nov 26 '19 at 15:22