Overlapping two gene sets ,finding their overlap significance and plotting them

Question

(Fig. 3a, b, Extended Data Fig. 3a, b and Supplementary Table 1). After 48 h, more than one-third of the transcriptome was differentially expressed (>5,000 genes; 405 genes encoding for proteins in the extracellular region, Gene Ontology (GO) accession 0005576), significantly overlapping with the gene expression changes of A375 tumours in vivo after 5 days of vemurafenib treatment (Fig. 3a, b and Extended Data Fig. 3c). Similar extensive gene expression changes were observed in Colo800 and UACC62 melanoma cells treated with vemurafenib and H3122 lung adenocarcinoma cells treated with crizotinib (Extended Data Fig. 3d). Despite different cell lineages, different oncogenic drivers, and different targeted therapies we observed a significant overlap between the secretome of melanoma and lung adenocarcinoma cells (P < 9.11 × 10−5)

The original paper

I would like to see similar to the figure f where it shows the intersection and significance overlap. To achieve that i got this code working till the intersection part but I dont know how to run the significance part.

library(reshape2)
library(venneuler)
RNA_seq_cds <- read.csv("~/Downloads/RNA_seq_gene_set.txt", header=TRUE, sep="\t")
head(RNA_seq_cds)
ATAC_seq <- read.csv("~/Downloads/ATAC_seq_gene_set.txt", header=TRUE, sep="\t")
head(ATAC_seq)
RNA_seq <- RNA_seq_cds
ATAC_seq <- ATAC_seq

#https://stackoverflow.com/questions/6988184/combining-two-data-frames-of-different-lengths
cbindPad <- function(...) {
  args <- list(...)
  n <- sapply(args, nrow)
  mx <- max(n)
  pad <- function(x, mx) {
    if (nrow(x) < mx) {
      nms <- colnames(x)
      padTemp <- matrix(NA, mx - nrow(x), ncol(x))
      colnames(padTemp) <- nms
      if (ncol(x) == 0) {
        return(padTemp)
      } else {
        return(rbind(x, padTemp))
      }
    } else {
      return(x)
    }
  }
  rs <- lapply(args, pad, mx)
  return(do.call(cbind, rs))
}

dat <- cbindPad(ATAC_seq, RNA_seq)

vennfun <- function(x) { 
  x$id <- seq(1, nrow(x))  #add a column of numbers (required for melt)
  xm <- melt(x, id.vars="id", na.rm=TRUE)  #melt table into two columns (value & variable)
  xc <- dcast(xm, value~variable, fun.aggregate=length)  #remove NA's, list presence/absence of each value for each variable (1 or 0)
  rownames(xc) <- xc$value  #value column=rownames (required for Venneuler)
  xc$value <- NULL  #remove redundent value column
  xc  #output the new dataframe
}

#https://stackoverflow.com/questions/9121956/legend-venn-diagram-in-venneuler
VennDat <- vennfun(dat)
genes.venn <- venneuler(VennDat)
genes.venn$labels <- c("RNA", "\nATAC" )
# plot(genes.venn, cex =15, )
#https://stackoverflow.com/questions/30225151/how-to-create-venn-diagram-in-r-studio-from-group-of-three-frequency-column
#https://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html   
vd <- venneuler(VennDat)
vd$labels <- paste(genes.venn$labels, colSums(VennDat))
plot(vd, cex=10)
text(.3, .45, 
     bquote(bold("Common ="~.(as.character(sum(rowSums(VennDat) == 2))))), 
     col="red", cex=1)

LABS <- vd$labels

The above code gives me the intersection plot

Now the significance part how do i do that between two gene sets and show it as shown in the original figure.

My data which i have used to generate the above plot

Any suggestion or help would be really appreciated.

score 1 · Accepted Answer · answered Sep 11 '21 at 17:50

1

If you talk about how to place any text under your figure, just use 'text' as you did before. It's just some guessing on which x= and y= coordinates. Thexpd=TRUE allows you to plot over the margin.

VennDat <- vennfun(dat)
vd <- venneuler(VennDat)
vd$labels <- paste(c("RNA", "ATAC"), colSums(VennDat))

plot(vd, cex=10, border=c(NA, 'red'), col=c('#6b65af', '#ad7261'))
text(x=.5, y=.5, sum(rowSums(VennDat) == 2), xpd=TRUE)
text(.5, .15, 'overlap\n', xpd=TRUE)
text(.5, .13, bquote(italic(p)*'< 9.11E-55'), xpd=TRUE)

I also adjusted some parameters of plot. You may inspect the code of the plotting method using:

venneuler:::plot.VennDiagram

If you want to know how significance is calculated, you should post your question at Cross Validated.

answered Sep 11 '21 at 17:50

jay.sf

60,139
8
53
110

thank you for the answer actually I didn't know how to calculate the test required to know the overlap significance – PesKchan Sep 11 '21 at 18:05
shall I repost the question again on the other site? – PesKchan Sep 11 '21 at 18:06
1

@PesKchan Yes of course. You can also consider [bioinformatics.stackexchange](https://bioinformatics.stackexchange.com/help/on-topic). – jay.sf Sep 11 '21 at 18:16

Overlapping two gene sets ,finding their overlap significance and plotting them

1 Answers1