0

I am new to ggplot2 and I am having difficulties to make a barplot for each gene by 2 factors.

I would like to plot each gene individually by 2 factors: "cell_type" and "age".

The x-axis would represent "cell type" (6) categories and inside each "cell type" category should be 5 bars representing the "age" categories. The y-axis would represent the gene expression values (mean + error bars).

My code:

mat= t(exprs(eSet))
colnames(mat) = fData(eSet)$Symbol
rownames(mat = pData(eSet)$genotype
GENOTYPE <- rownames(mat)
AGE <- pData(eSet)$age
d.f_all_genes2 <- data.frame(GENOTYPE, AGE, mat)

d.f_all_genes2[1:3,1:10]

GENOTYPE AGE X1.2.SBSRNA4 A1BG A1BG.AS1 A1CF A2LD1 A2M A2ML1 A2MP1
1 rag_a   54            0    0        0    0     0   0     0     0
2 rag_wt  54            0    0        0    0     0  18     0     0
3 wt_wt   54            0    0        0    0     0   1     0     0

melted <- melt(d.f_all_genes2, id.vars="GENOTYPE") 
head(melted)

           GENOTYPE   variable value
1           rag_a       AGE     54
2           rag_wt      AGE     54
3           wt_wt       AGE     54

Unfortunately, I lost all the genes.

I was also planning to do the followings:

means <- ddply(melted, c("AGE", "variable"), summarise, mean=mean(value))
means.sem <- ddply(melted, c("AGE", "variable"), summarise, mean=mean (value),sem=sd(value)/sqrt(length(value)))
means.sem <- transform(means.sem, lower=mean-sem, upper=mean+sem)

ggplot(means[means$variable == "GENE of Interest=Symbol",], aes(x = factor(AGE), y = mean))  + geom_bar(stat= "identity", colour = "blue", outlier.shape = NA)+ facet_grid(~GENOTYPE) + facet_wrap(~variable) +  ylab(expression(paste(Log[2], " Expression Values"))) + theme(axis.text=element_text(size=13, color="black"),axis.title=element_text(size=12, face="bold",color="black"), plot.title=element_text(size=14,face="bold", color="black"), strip.text.x = element_text(colour = "black", face= "bold",angle = 0, size = 20)) 

Any advice and help how to make it work are highly appreciated.

Thanks a lot in advance.

tonytonov
  • 25,060
  • 16
  • 82
  • 98
alakatos
  • 1
  • 2
  • Based on the description, it seems like your `id.vars` should include `AGE` as well as `GENOTYPE`. – aosmith Aug 18 '15 at 23:06
  • Welcome to SO! Please make a fully [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), that will increase the odds for you to get a complete answer. – tonytonov Aug 19 '15 at 08:03

1 Answers1

0

It is difficult to see from your example, but in the below I'm going to assume that your original table has more than one row for each age/genotype combination.

First aosmith in the comments is right about the melt statement. You can also give the variable a name to make things clearer. The statement should be:

>melted <- melt(d.f_all_genes2, id.vars=c("GENOTYPE", "AGE"), variable_name="Symbol")
   GENOTYPE AGE       Symbol value
1     rag_a  54 X1.2.SBSRNA4     0
2    rag_wt  54 X1.2.SBSRNA4     0
3     wt_wt  54 X1.2.SBSRNA4     0
4     rag_a  54         A1BG     0
5    rag_wt  54         A1BG     0
6     wt_wt  54         A1BG     0
....<SNIP>...

Now you have the data in the right form, its time to plot it. Its always difficult to discribe what you want, but I'm thinking you want a grid of panels, with the genotypes arranged left to right and the genes top to bottom. You might want to consider points rather than bars, and then putting all the genotypes on a single plot, but here is how you do the bars.

First the data you want to plot is the data in melted

> gg <- ggplot(melted)

On the x axis you want AGE and on the y value, so:

> gg <- gg + aes(x=AGE, y=value)

and you want a grid of panels or facets so:

> gg <- gg + facet_grid(Symbol~GENOTYPE)

now so a neat trick. ggplot can handle doing the summarizing for you using stat_summary, so no need to do it beforehand.

> gg <- gg + stat_summary(fun.y=mean, geom="bar", fill="blue")

that adds the bars. You also need to add the errorbars, i'm going to define an sem function to make it neater:

> sem <- function(x) sqrt(var(x)/length(x))
> gg <- gg + stat_summary(fun.ymin=function(x) mean(x)-sem(x),
+                         fun.ymax=function(x) mean(x)+sem(x), 
+                         fun.y=mean,
+                         geom="errorbar")

All that remains is to add your formatting

> gg <- gg + ylab(expression(paste(Log[2], " Expression Values"))) + theme(axis.text=element_text(size=13, color="black"),axis.title=element_text(size=12, face="bold",color="black"), plot.title=element_text(size=14,face="bold", color="black"), strip.text.x = element_text(colour = "black", face= "bold",angle = 0, size = 20)) 
Ian Sudbery
  • 1,708
  • 13
  • 15
  • Thanks Ian very much. It works the way I wanted. `ggplot(melted[melted$Symbol=="APP",],aes(x=AGE, y=value) + stat_summary(fun.y=mean, geom="bar", fill="blue") + facet_grid(~ GENOTYPE)` I can make a plot for any gene of interest. Unfortunately, the error bar function does not work. Shall I substitute the x for the value of the gene symbol? Thank you very much. – alakatos Aug 19 '15 at 20:01
  • Sorry, my bad. I've edited the parameters to stat_summary. Should work now. – Ian Sudbery Aug 20 '15 at 15:31