1

So I'm trying to visualize my data by making box-plots from a group of genes so that for every gene I have different boxes for different strains. Data looks more or less like this:

Strain  gene1         gene2      gene3  .   .   .
 A    2.6336700     1.42802     0.935742
 A    2.0634700     2.31232     1.096320
 A    2.5798600     2.75138     0.714647
 B    2.6031200     1.31374     1.214920
 B    2.8319400     1.30260     1.191770
 B    1.9796000     1.74199     1.056490
 C    2.4030300     1.20324     1.069800
 C    1.4829864     5.570571    12.29139
 C    0.7212928     6.070519    11.63530
 .
 .
----------

So for this example I would like to get 3 different pictures (one per gene) and each picture should contain 3 boxes (one for each strain). There is probably a nice and easy way of doing this but so far I'm drawing a Blank...


Thanks for all the answers / advice it was all really helpful.

user2764233
  • 75
  • 2
  • 6
  • 2
    Have a look [here](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/) and [here](http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/). From the same cookbook: "To make graphs with ggplot2, the data must be in a data frame, and in "long" (as opposed to wide) format.) – Henrik Sep 30 '13 at 08:48

2 Answers2

4

Here is one way to do it with ggplot2.

First we pass your data frame to a long format :

library(reshape2)
dfm <- melt(df)

And then :

library(ggplot2)
ggplot(data=dfm) + geom_boxplot(aes(x=Strain,y=value)) + facet_wrap(~variable)

enter image description here

juba
  • 47,631
  • 14
  • 113
  • 118
3

The lattice package is excellent for this kind of grouping. I have always found it easier to work with than ggplot2. You can also do it with the old fashioned base R "ink-on-paper-method" (link), although it is a bit more manual.

First you need to reshape the data frame (btw this step is more nicely done with the reshape2 package that juba suggested, but I'll keep my solution as an alternative).

str <- "Strain  gene1         gene2      gene3
 A    2.6336700     1.42802     0.935742
 A    2.0634700     2.31232     1.096320
 A    2.5798600     2.75138     0.714647
 B    2.6031200     1.31374     1.214920
 B    2.8319400     1.30260     1.191770
 B    1.9796000     1.74199     1.056490
 C    2.4030300     1.20324     1.069800
 C    1.4829864     5.570571    12.29139
 C    0.7212928     6.070519    11.63530"

tab <- read.table(con <- textConnection(str), header=TRUE)
tab <- data.frame(Strain=tab$Strain, stack(tab[-1]))

names(tab) <- c("Strain", "Expression", "Gene")

Then plot

library(lattice)
bwplot(Expression ~ Strain | Gene, tab)

enter image description here

or the other way combined into a single panel

bwplot(paste(Strain, Gene) ~ Expression, tab)

enter image description here

Community
  • 1
  • 1
Backlin
  • 14,612
  • 2
  • 49
  • 81