0

This question has been asked before, but not in a way it fits my data, so I try again :)

I want to make multiple individual ggplots without having to specify how it should be made every time. My dataset contains gene expression data and from these I want to plot specific genes.

Let's use this as an example

df <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="GENE   SYMBOL          Patient1   Patient2   
                   TP53         ILMN_2            3.55        3.66
                   TP53         ILMN_3            5.49        4.99
                   XBP1         ILMN_5            4.06        2.53
                   TP27         ILMN_1            2.53        3.33
                   REDD1        ILMN_4            3.99        4.56
                   ERO1L        ILMN_6            5.02        6.95
                   STK11        ILMN_9            3.64        2.01
                   HIF2A        ILMN_8            2.96        4.76 ")

In order to plot selected genes from df, I usually do the following:

First I make an object I can use for searching in the dataframe

SYMBOL_info <- select(df, SYMBOL)

Then, I define the gene I'm interested in as:

library(dplyr)
library(tidyr)
library(ggplot2)

geneOfInterest <- c(SYMBOL_info == "5")

Next, the gene of interest is found in the dataframe and the dataframe is gathered to fit ggplots requirements:

df_gather<- df %>%
  filter(geneOfInterest) %>% 
  gather(key=Patient, value=values, -c(GENE, SYMBOL))

In the end, the gene of interest in the dataset is plotted:

ggplot()+
  geom_point(df_gather, mapping = aes(x=Patient, y=values, color=GENE))+
  labs(title="XBP1 plot", subtitle = "Symbol: 5")+
  ggsave("XBP1_plot.png")

However, I have many genes I'd like to plot, e.g. both versions of TP53, REDD1, STK11 and HIF1A. Any suggestions to how this can be done without having to change geneOfInterest and the information with in the plotting-part of the code everytime? I guess a for loop needs to be made, but copying the other solutions given here didn't help me (as shown here: R: saving multiple ggplots using a for loop).

Thanks in advance! :)

EDIT: SYMBOL-values are changed to start with ILMN_ instead of just numbers

Beate
  • 25
  • 5
  • I'd avoid the loop and write a little function instead. Also I get an error: `object 'PROBE_ID' not found` – Axeman Sep 07 '17 at 13:29
  • This worked for me using facets new.df <- melt(df %>% select(-SYMBOL)) ggplot(new.df) + geom_point(aes(x = variable, y = value, color = Gene)) + facet_wrap( ~ Gene) – Kozolovska Sep 07 '17 at 13:35

2 Answers2

2

Depending on the number of genes and data you want to plot, I recommend to plot everything in one plot using facets like this:

df %>% 
  filter(Gene %in% c("TP53", "ERO1L", "HIF2A")) %>% 
  gather(key, value, -Gene, -SYMBOL) %>% 
  ggplot(aes(key, value, fill=Gene))+
    geom_col()+
    facet_wrap(~Gene+ SYMBOL, labeller = label_both)

enter image description here

Otherwise try this:

sapply(c(2,3,9), function(x){
  geneOfInterest <- df[ df$SYMBOL == x, 1]
  df %>%
   filter(SYMBOL == x) %>% 
   gather(key=Patient, value=values, -Gene,-SYMBOL) %>% 
   ggplot(aes(x=Patient, y=values, color=Gene))+
    geom_point()+
    labs(title="XBP1 plot", subtitle = "Symbol: 5") + 
    ggsave(paste0(geneOfInterest, "_plot.png"))
})
Roman
  • 17,008
  • 3
  • 36
  • 49
0

Thank you, Jimbou!

I just want to add that I changed your suggested function into this (to include the Gene and Symbol in both the plot and file-name):

sapply(c(2,5,9), function(x){
  geneOfInterest <- df[ df$SYMBOL == x, 1]
  df %>%
    filter(SYMBOL == x) %>% 
    gather(key=Patient, value=values, -GENE,-SYMBOL) %>% 
    ggplot(aes(x=Patient, y=values, color=GENE))+
    geom_point()+
    labs(title=paste0(symbolOfInterest, " plot"), subtitle =paste0("Symbol: ", x)) + 
    ggsave(paste0(geneOfInterest,"_symbol_",x, "_plot.png"))
})
Beate
  • 25
  • 5