1

I am fairly new at this. I want to construct a volcano plot that looks something like this: This is what I have so far

With the following code:

genes <- read_excel("VolcanoData.xlsx")
genes$Significant <- ifelse(genes$pvalue < 0.00000519, "FDR < 0.00000519", "Not Sig")

ggplot(data=genes, aes(x = rho, y = -log10(pvalue)))+
  geom_point(aes(color = Significant), size=0.1)+
  theme_bw(base_size = 12) + theme(legend.position = "bottom")+
  scale_color_manual(values = c("red", "grey"))

And data that looks something like this

head(genes)
# A tibble: 6 x 5
  gene       rho   pvalue label  Significant     
  <chr>    <dbl>    <dbl> <chr>  <chr>           
1 NUBPL   -0.936 9.79e-30 normal FDR < 0.00000519
2 EPB41L5 -0.931 2.41e-29 ND     FDR < 0.00000519
3 PIGU    -0.930 4.49e-29 normal FDR < 0.00000519
4 TSHR    -0.920 6.78e-27 normal FDR < 0.00000519
5 ENPEP   -0.916 1.11e-26 normal FDR < 0.00000519
6 SEC22A  -0.910 3.88e-26 normal FDR < 0.00000519

tail(genes)
# A tibble: 6 x 5
  gene          rho pvalue label  Significant
  <chr>       <dbl>  <dbl> <chr>  <chr>      
1 HIGD1B  0.00144    0.993 normal Not Sig    
2 CHST3  -0.000712   0.996 normal Not Sig    
3 TLR10   0.000418   0.998 normal Not Sig    
4 AVPR1A -0.000333   0.998 ND     Not Sig    
5 MFSD10 -0.000314   0.998 normal Not Sig    
6 PARP10  0.0000317  1.000 normal Not Sig 

I would like to color only the genes labeled "ND" in black. I've tried different combinations but I can't seem to make it work. Thank you!

Yaiza95
  • 43
  • 5
  • Could you make your problem reproducible by sharing a sample of your data and the code you're working on so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Feb 13 '19 at 00:40
  • sounds like you might want to use `gghighlight` – GordonShumway Feb 13 '19 at 00:55
  • @Tung, I can't figure out how the reprex works, but I have pasted the code and the example of df is the same I'm using – Yaiza95 Feb 13 '19 at 01:14
  • @GordonShumway I can't figure out how to only highlight the ones I want – Yaiza95 Feb 13 '19 at 01:14

1 Answers1

0

You can try to use a subset of gene data by using grepl function inside geom_point:

ggplot(data=genes, aes(x = rho, y = -log10(pvalue)))+
  geom_point( data = genes[grepl("ND", genes$label),], 
              aes(color = Significant), 
              size=0.1)+
  theme_bw(base_size = 12) + 
  theme(legend.position = "bottom")+
  scale_color_manual(values = c("red", "grey"))
Ulises Rosas-Puchuri
  • 1,900
  • 10
  • 12