I am working with a data frame that has the following header names:
> [1] "Filename" "Strain" "DNA_Source" "Locus_Tag" "Product" "Transl_Tbl" "Note" "Seq_AA" "Protein_ID"
Using the following code I get a graph that shows how many genes are found within a particular bacterial strain:
png(filename=paste('images/Pangenome_Histogram.png', sep=''), width=3750,height=2750,res=300)
par(mar=c(9.5,4.3,4,2))
print(h <- ggplot(myDF, aes(x=Strain, stat='bin', fill=factor(Filename), label=myDF$Filename)) + geom_bar() +
labs(title='Gene Count by Strain Pangenome', x='Campylobacter Strains', y='Gene Count\n') +
guides(title.theme = element_text(size=15, angle = 90)) + theme(legend.text=element_text(size=15), text = element_text(size=18)) +
theme(axis.text.x=element_text(angle=45, size=16, hjust=1), axis.text.y=element_text(size=16), legend.position='none', plot.title = element_text(size=22)) )
Perhaps a bit hard to see, but for example, some strains have bars that are multi-colored -- indicating that some of the strain's genes are coming from sources other than the bacterial chromosome (or from several chromosomes if the bacteria has multiple chromosomes). I would like to label the bars according to the source of the genes ("DNA_Source") at the appropriate position.
png(filename=paste('images/Pangenome_Histogram.png', sep=''), width=3750,height=2750,res=300)
par(mar=c(9.5,4.3,4,2))
print(h <- ggplot(myDF, aes(x=Strain, stat='bin', fill=factor(Filename), label=myDF$Filename)) + geom_bar() +
labs(title='Gene Count by Strain Pangenome', x='Campylobacter Strains', y='Gene Count\n') +
guides(title.theme = element_text(size=15, angle = 90)) + theme(legend.text=element_text(size=15), text = element_text(size=18)) +
geom_text(aes(label=DNA_Source, y='identity'), color='black', vjust=-5, size=4) +
theme(axis.text.x=element_text(angle=45, size=16, hjust=1), axis.text.y=element_text(size=16), legend.position='none', plot.title = element_text(size=22)) )
This gets me close, but it removes the count from the y-axis (and adds the word "identity" on the lower left hand side) and it labels the contributions on top of each other so that they cannot be read unless it is the same word.
I would like have the y-axis labeled like the first image, with the labels in the second -- but I would like for those labels to appear within their corresponding color portion of the histogram (similar visually to here: Showing data values on stacked bar chart in ggplot2), but I would like to accomplish it using the ggplot2 package.
I hope this is clear. Help is appreciated -- so thanks in advance.
Here is a bit of data (tail(dput(myDF[c(2, 3, 5)])))...
Strain DNA_Source Product
12299 Campy3194c Plasmid Type VI secretion protein, VC_A0111
12300 Campy3194c Plasmid Type VI secretion protein
12301 Campy3194c Plasmid Tgh104
12302 Campy3194c Plasmid protein ImpC
12303 Campy3194c Plasmid Type VI secretion protein
12304 Campy3194c Chromosome Tgh079