27

I have two graphs and I am trying to overlay one on top of the other:

An example of the data frame "ge" looks like this. In actuality there are 10 Genes with 200 samples each, so there are 2000 rows and 3 columns:

Exp    Gene    Sample
903.0   1       1
1060.0  1       2
786.0   1       3
736.0   1       4
649.0   2       1
657.0   2       2
733.5   2       3
774.0   2       4

An example of the data frame "avg" looks like this. This is an average of the data points for each gene across all samples. In actuality this graph has 10 genes, so the matrix is 4col X 10 rows:

mean       Gene   sd         se
684.2034    1   102.7142    7.191435
723.2892    2   100.6102    7.044122

The first graph graphs a line of the average expression for each gene along with the standard deviation for each data point.

avggraph <- ggplot(avg, aes(x=Gene, y=mean)) + geom_point() +geom_line() + geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=.1)

The second graph graphs the gene expression in the form a line for each sample across all the genes.

linegraphs <- ggplot(ge, aes(x=Gene, y=Expression, group=Samples, colour="#000099")) + geom_line() + scale_x_discrete(limits=flevels.tge)

I would like to superimpose avggraph on top of linegraphs. Is there a way to do this? I've tried avggraph + linegraphs but I'm getting an error. I think this is because the graphs are generated by two different data frames.

I should also point out that the axes of both graphs are the same. Both graphs have the genes on the X-axis and the gene expression on the Y-axis.

Any help would be greatly appreciated!

Sheila
  • 2,438
  • 7
  • 28
  • 37

2 Answers2

28

One way is to add the geom_line command for the second plot to the first plot. You need to tell ggplot that this geom is based on a different data set:

ggplot(avg, aes(x=Gene, y=mean)) + 
  geom_point() + 
  geom_line() + 
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=.1) +
  geom_line(data = ge, aes(x=Gene, y=Exp, group=Sample, colour="#000099"),
            show_guide = FALSE)

The last geom_line command is for creating the lines based on the raw data. enter image description here

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Thanks @Sven . This works, however when i do this the avg line (in black) is behind the pink lines. And because there are so many pink lines in my data, you can't see the black one. Any suggestions on how to flip it? – Sheila Nov 21 '12 at 07:32
  • 1
    @ShilaP Just change the order of the geoms: `ggplot() + geom_line(data = ge, aes(x=Gene, y=Exp, group=Sample, colour="#000099"), show_guide = FALSE) + geom_line(data = avg, aes(x=Gene, y=mean)) + geom_errorbar(data = avg, aes(x=Gene, ymin=mean-sd, ymax=mean+sd), width=.1) + geom_point(data = avg, aes(x=Gene, y=mean))` – Sven Hohenstein Nov 21 '12 at 07:36
  • Got it! Yes I thought it would require to flip the code for geom_line() in some way but couldn't exactly figure it out. Thanks for your help! – Sheila Nov 21 '12 at 07:52
2

The workaround which I found was that instead of merging the two plots, I merged the data. I added an additional column at the end of the two dataframes and then performed the rbind operation on them.The using either the fill or color aesthetics to separate the two plots. Of course in my case the scale used for the axis were to be the same.

  • Great approach - this scales a lot better in case of multiple data sets than multiple layers approach, and you get a legend. Demonstrating on the data from the question would make this answer better. – Gregor Thomas Jul 02 '18 at 18:52