0

I have a question on how to plot my data using a boxplot and integrating 3 different information types. In particular, I have a data frame that looks like this:

  Exp_number  Condition  Cell_Type     Gene1        Gene2       Gene3     

       1           2       Cancer       0.33         0.2         1.2
       1           2       Cancer       0.12         1.12        2.5
       1           4       Fibro        3.4          2.2         0.8
       2           4       Cancer       0.12         0.4         0.11
       2           4       Normal       0.001        0.01        0.001
       3           1       Cancer       0.22         1.2         3.2  
       2           1       Normal       0.001        0.00003     0.00045 

for a total of 20.000 columns and 110 rows (rows are samples).

I would like to plot a boxplot in which data are grouped first by a condition. Then, in each condition, I would like to highlight, for example using different colors, the exp_number and finally, I don't know how but I would like to highlight the cell type. The aim is to highlight the differences between exp_number between conditions in terms of gene expression and also differences of cell types between Exp_numbers. Is there a simple way to integrate all this information in a single plot?

Thank you in advance

NewUsr_stat
  • 2,351
  • 5
  • 28
  • 38
  • 1
    See if [this question](https://stackoverflow.com/questions/14604439/plot-multiple-boxplot-in-one-graph) helps you solve the problem. – Rui Barradas Sep 09 '18 at 10:47

2 Answers2

1

What about this approach

dat <- data.frame(Exp_number=factor(sample(1:3,100,replace = T)), 
                  condition=factor(sample(1:4,100,T)), 
                  Cell_type=factor(sample(c("Normal", "Cancer", "Fibro"), 100, replace=T)), 
                  Gene1=abs(rnorm(100, 5, 1)), 
                  Gene2=abs(rnorm(100, 6, 0.5)), 
                  Gene3=abs(rnorm(100, 4, 3)))

library(reshape2)
dat2 <- melt(dat, id=c("Exp_number", "condition", "Cell_type"))

ggplot(dat2, aes(x=Exp_number, y=value, col=Cell_type)) + 
    geom_boxplot() + 
    facet_grid(~ condition) + 
    theme_bw() + 
    ylab("Expression")

That gives the following result

enter image description here

storaged
  • 1,837
  • 20
  • 34
  • Is there a way to not to plot the experiment number? In other words, when I apply your solution, the experiment number is plotted in the final image so that for example, if the experiment number is 1 a vertical line of dots corresponding to 1 is plotted. I will edit my question with a figure so that the problem will be clear. – NewUsr_stat Sep 09 '18 at 18:01
  • 1
    if you want to remove labels of `facet_grid` you can add: `+ theme( strip.background = element_blank(), strip.text.x = element_blank() )` at the end of plotting command. Is it what you look for? If so, I will edit my answer and add that – storaged Sep 09 '18 at 18:30
  • That was my point! Solved with your suggestion! Thank you very much! – NewUsr_stat Sep 10 '18 at 08:38
0

Similar to @storaged's answer, but leveraging the two dimensions of facet_grid to represent 2 of your variables:

ggplot(dat2, aes(x=Cell_type, y=Expression)) + 
  geom_boxplot() + 
  facet_grid(Exp_number ~ condition) + 
  theme_bw() 

enter image description here

The data:

library(reshape2)
dat <- data.frame(Exp_number=factor(sample(1:3,100,replace = T)), 
                  condition=factor(sample(1:4,100,T)), 
                  Cell_type=factor(sample(c("Normal", "Cancer", "Fibro"), 100, replace=T)), 
                  Gene1=abs(rnorm(100, 5, 1)), 
                  Gene2=abs(rnorm(100, 6, 0.5)), 
                  Gene3=abs(rnorm(100, 4, 3)))

dat2 <- melt(dat, id=c("Exp_number", "condition", "Cell_type"), value.name = 'Expression')
dat2$Exp_number <- paste('Exp.', dat2$Exp_number)
dat2$condition <- paste('Condition', dat2$condition)
dww
  • 30,425
  • 5
  • 68
  • 111