1

I am trying to make a stacked barplot using R. The main sticking point is using the colors from the color column in the plot appropriately.

Requirements of the plot:

  • Each bar(x axis) should represent a time.
  • Each species should be its appropriate color (given by the color column) with its space on the barplot reflecting abundance(y axis).
  • Within each bar, the species in the same phyla should be grouped together.
  • Setting the width of the bars would be really cool, but not necessary.

Characteristics of the dataset:

  • Each species has an individual color and the colors of the species are gradiented by their phyla.
  • The abundances of species within a time sum to 100.
  • Not every species is in every time
  • There are 7 times, 8 phyla, 132 species

Other ideas on how to represent these data are welcome.

Representative data:

phyla           species                         abundance    color    time
Actinobacteria  Bifidobacterium_adolescentis    18.73529    #F7FBFF   D30
Firmicutes      Faecalibacterium_prausnitzii    14.118      #F7FCF5   D30
Firmicutes      Catenibacterium_mitsuokai       12.51944    #F3F9F2   D30
Bacteroidetes   Bacteroides_ovatus              7.52241     #FFF5EB   D30
Firmicutes      Faecalibacterium_prausnitzii    21.11866    #F7FCF5   D7
Firmicutes      Ruminococcus_sp_5_1_39BFAA      13.54397    #92B09C   D7
Actinobacteria  Bifidobacterium_adolescentis    10.21989    #F7FBFF   D7
Actinobacteria  Bifidobacterium_adolescentis    38.17028    #F7FBFF   D90
Firmicutes      Catenibacterium_mitsuokai       11.04982    #F3F9F2   D90
Firmicutes      Faecalibacterium_prausnitzii    9.82507     #F7FCF5   D90
Actinobacteria  Collinsella_aerofaciens         5.2334      #D4DEE9   D90

Thank you in advance; I am banging my head against the wall with this.

Code thanks to Robert.

#reshape the dataframes as matrices
#species are row names and times are columns (abundance data makes up matrix)
#put the matrix times in the correct order
#create stacked barplot that has the width of column reflecting shannon index
#save the stacked barplots in files named by the entry list
for(i in 1:n){
  phyl=aggregate(abundance ~ phyla+species+color+time, dfs[[i]], sum)
  phyl=phyl[with(phyl,order(phyla,species,time)),]
  wide <- reshape(phyl, idvar = c("phyla","species","color"),
                  timevar = "time", direction = "wide")
  wide[is.na(wide)]<-0
  wide

  res1=as.matrix(wide[,-c(1:3)],ncol=dim(wide[,-c(1:3)])[2])
   colnames(res1)=
    unlist(strsplit(colnames(res1), ".", fixed = TRUE)) [seq(2,length(colnames(res1))*2,by=2)]
  rownames(res1)=wide$species
  res1 <- res1[,c('E','FMT','PA','PF','D7','D30','D90')]

  bar.width <- as.matrix(div.dfs[[i]]['frac'])

   mypath <- file.path(output.path,paste(project.name, "_", lhs[i], ".tiff", sep = ""))
  tiff(file=mypath)
  mytitle = paste(project.name, lhs[i])
  barplot(res1,col=wide$color,beside = F, width = c(bar.width), main = mytitle, legend.text=F,args.legend=
            list(x = "top",bty="n",cex=.6,ncol=2))
  dev.off()

  rm(res1)
}

#makes the legend and exports as a eps file
setwd(output.path)
plot_colors <- database$color
text <- database$species
SetEPS()
postscript('legend.eps')
plot.new()
par(xpd=TRUE)
legend("center",legend = text, text.width = max(sapply(text, strwidth)),
       col=plot_colors, lwd=1, cex=.2, horiz = F, ncol=2, bty='n')
par(xpd=FALSE)
dev.off()
  • possible duplicate of [Stacked Bar Plot in R](http://stackoverflow.com/questions/20349929/stacked-bar-plot-in-r) – Michal Jun 11 '15 at 00:49
  • You can use the reshape2 package. The `cast` and `melt` functions can help you change the data to a more usable format – Michal Jun 11 '15 at 00:51
  • Can you explain further what you mean? Also, will this solve the problem with the hex code coloring? – shadowofzedark Jun 11 '15 at 01:29

1 Answers1

0

This is without phyla

cols=sapply(unique(dat$species),function(sp)unique(dat$color[dat$species==sp]))
res=tapply(dat$abundance, list(species = dat$species, time = dat$time), sum)
res[is.na(res)]<-0
barplot(res,col=cols,beside = F,legend.text=T,args.legend=
          list(x = "top",bty="n",cex=.6,ncol=2))

This is the approach considering phyla

phyl=aggregate(abundance ~ phyla+species+color+time, dat, sum)
phyl=phyl[with(phyl,order(phyla,species,time)),]
wide <- reshape(phyl, idvar = c("phyla","species","color"),
          timevar = "time", direction = "wide")
wide[is.na(wide)]<-0
wide

res1=as.matrix(wide[,-c(1:3)],ncol=dim(wide[,-c(1:3)])[2])
colnames(res1)=
unlist(strsplit(colnames(res1), ".", fixed = TRUE))[seq(2,length(colnames(res1))*2,by=2)]
rownames(res1)=wide$species

barplot(res1,col=wide$color,beside = F,legend.text=T,args.legend=
          list(x = "top",bty="n",cex=.6,ncol=2))

Robert
  • 5,038
  • 1
  • 25
  • 43
  • Not sure why, but this does not match the colors correctly. The length of cols does not match the number of species in the res matrix. Could I solve the problem of grouping the phyla by matching the phyla to the species in the res matrix, reordering the matrix by phyla, and then making the barplot from the reordered matrix with the phyla column deleted? – shadowofzedark Jun 11 '15 at 20:19
  • Yes, the only thing you need to consider is change the order of cols, since you will change the order of species.. I´ll try to upload the plot I got here. – Robert Jun 11 '15 at 22:42