3

I have a dataframe as below

    G1  G2      G3          G4      group
S_1 0   269.067 0.0817233   243.22  N
S_2 0   244.785 0.0451406   182.981 N
S_3 0   343.667 0.0311259   351.329 N
S_4 0   436.447 0.0514887   371.236 N
S_5 0   324.709 0   293.31  N
S_6 0   340.246 0.0951976   393.162 N
S_7 0   382.889 0.0440337   335.208 N
S_8 0   368.021 0.0192622   326.387 N
S_9 0   267.539 0.077784    225.289 T
S_10    0   245.879 0.368655    232.701 T
S_11    0   17.764  0   266.495 T
S_12    0   326.096 0.0455578   245.6   T
S_13    0   271.402 0.0368059   229.931 T
S_14    0   267.377 0   248.764 T
S_15    0   210.895 0.0616382   257.417 T
S_16    0.0401525   183.518 0.0931699   245.762 T
S_17    0   221.535 0.219924    203.275 T

Now I want to make a multiboxplot with all the 4 genes in columns. The first 8 rows are for normal samples an rest 9 rows are tumor samples so for each gene I should be able to make 2 box plots with labels of tissues. I am able to make individual boxplots but how should I put all the 4 genes in one plot and also label the tissue for each boxplots and use the stripchart points. Is there a easy way to do it? I can only make individual plots using the row and column names but cannot mark the labels based on column groups in the plot and also plot the points with the stripchart. Any help will be appreciated. Thanks

ivivek_ngs
  • 917
  • 3
  • 10
  • 28

2 Answers2

3

with facet_wrap:

head(df)

    G1      G2        G3      G4 group
S_1  0 269.067 0.0817233 243.220     N
S_2  0 244.785 0.0451406 182.981     N
S_3  0 343.667 0.0311259 351.329     N
S_4  0 436.447 0.0514887 371.236     N
S_5  0 324.709 0.0000000 293.310     N
S_6  0 340.246 0.0951976 393.162     N

library(reshape2)
df <- melt(df)

library(ggplot2)
ggplot(df, aes(x = variable,y = value, group=group, col=group)) +      
facet_wrap(~variable, scales = 'free') + geom_boxplot()

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
2

Not sure what you mean with stripchart points, I assumed you wanted to visualize the actual points overlaid on the boxplots. Would the following suffice?

library(ggplot2)
library(dplyr)
library(reshape2)

melt(df) %>% 
 ggplot(aes(x = variable, y = value, col = group)) + 
 geom_boxplot() + 
 geom_jitter()

Where df is the above data frame. Result:

enter image description here

thepule
  • 1,721
  • 1
  • 12
  • 22
  • Just to ask one more question , I performed this on log scale for a much better resolution and found that some dots are on the other plot which means points of green box plot in pink, so in that case dont we need to scale the plotting , if so which should be ideal? – ivivek_ngs Sep 15 '16 at 11:11
  • can you please tell me , isn't it a good idea to represent the dot points over the box plot of each genes since some of them are in other groups which might be a bit misleading. What kind of scaling should be applied here? In another case I was doing the dot points plot since the sample size in each case is below 20 so it is better to put them. What do you have to say? – ivivek_ngs Sep 16 '16 at 08:27
  • I am not sure what added value would representing the point on the graph have, specially with loads of points. If you want some more sophisticated box plot than I would suggest you look into [pirate plots](http://nathanieldphillips.com/2016/04/pirateplot-2-0-the-rdi-plotting-choice-of-r-pirates/) :) – thepule Sep 19 '16 at 08:24