-1

I have read around the forum but I have not found my desired answer.

I have the following dataset:

Dataset

The important columns are TGEClass and peptide:

I would like to calculate the overlap between the different TGEclasses

I used calculate.overlap(TGE) from VennDiagram but that does not give me the desired result;

The R code with a dummy dataset:

# A simple single-set diagram
C1 <- as.data.frame(letters[1:10])
C2 <- as.data.frame(letters[1:10])
data =cbind(C1,C2)

overlap <- calculate.overlap(data)
overlap = as.data.frame(overlap)

The R result: The result:

  a1 a2 a3
1  a  a  a
2  b  b  b
3  c  c  c
4  d  d  d
5  e  e  e
6  f  f  f

The desired result will look like this:

TGEClass

Desired Result

10 genes are expressed in both TGE classes

50 genes in only alternative

60 genes in only short

It is basically a ven diagram but in a table format.

Please note that each gene have a different number of TGE class categories.

I am very new to R so any help will be greatly appreciated.

Thanks very much,

Ishack

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we can't copy/paste them into R. Show the code you actually tried. – MrFlick Jul 01 '19 at 15:38
  • Hi MrFlick I have added the code I tested and the produced result. The problem is my categories are in rows rather than columns. How can I fix this please? – Ishack Marshook Jul 01 '19 at 15:49

1 Answers1

0

The output of VennDiagram::calculate.overlap() is not very convenient for later use (here using as.data.frame you just got lucky as both vectors are of same size).

You can actually use tidyverse to compute it yourself, and return the summary:

library(tidyverse)
list(
  "Cardiome" = letters[1:10],
  "SuperSet" = letters[8:24]
) %>% 
  map2_dfr(., names(.), ~tibble::enframe(.x) %>% mutate(group=.y)) %>% 
  add_count(value) %>%  
  group_by(value) %>% 
  summarise(group2 = ifelse(n()==2, "both", group)) %>% 
  count(group2)
#> # A tibble: 3 x 2
#>   group2       n
#>   <chr>    <int>
#> 1 both         3
#> 2 Cardiome     7
#> 3 SuperSet    14

If you want to stick with the output of VennDiagram::calculate.overlap(), you can use something like:

library(tidyverse)
overlap <- VennDiagram::calculate.overlap(
  x = list(
    "Cardiome" = letters[1:10],
    "SuperSet" = letters[8:24]
  )
);
map2_dfr(overlap, names(overlap), ~tibble::enframe(.x) %>% mutate(group=.y)) %>% 
  spread(group, group) %>% 
  mutate(a1_only = !is.na(a1) & is.na(a2),
         a2_only = !is.na(a2) & is.na(a1),
         both = !is.na(a2) & !is.na(a1)) %>% 
  summarise_at(c("a1_only", "a2_only", "both"), sum) %>% 
  gather(group, number, everything())
#> # A tibble: 3 x 2
#>   group   number
#>   <chr>    <int>
#> 1 a1_only     10
#> 2 a2_only     17
#> 3 both         0
Matifou
  • 7,968
  • 3
  • 47
  • 52