I have the following tibble:
library(tidyverse)
df <- tibble::tribble(
~gene, ~celltype,
"a", "cel1_1",
"b", "cel1_1",
"c", "cel1_1",
"a", "cell_2",
"b", "cell_2",
"c", "cell_3",
"d", "cell_3"
)
df %>% group_by(celltype)
#> Source: local data frame [7 x 2]
#> Groups: celltype [3]
#>
#> # A tibble: 7 x 2
#> gene celltype
#> <chr> <chr>
#> 1 a cel1_1
#> 2 b cel1_1
#> 3 c cel1_1
#> 4 a cell_2
#> 5 b cell_2
#> 6 c cell_3
#> 7 d cell_3
The genes in the overlap can be grouped the following way
cell1 a,b,c
cell2 a,b
cell3 c,d
What I want to do is to calculate gene overlap for all cell, resulting in this table:
cell1 cell2 cell3
cell1 3 2 1
cell2 2 2 0
cell3 1 0 2
How can I achieve that?
Update
And finally calculate percentage (divide by largest denominator in pair)
#cell1 cell2 cell3
cell1 1.00(3/3) 0.67 (2/3) 0.33 (1/3)
cell2 0.67 (2/3) 1.00 0
cell3 0.33 (1/3) 0 1.00
I tried this but doesn't get what I want:
> tmp <- crossprod(table(df))
> tmp/max(tmp)
celltype
celltype cel1_1 cell_2 cell_3
cel1_1 1.0000000 0.6666667 0.3333333
cell_2 0.6666667 0.6666667 0.0000000
cell_3 0.3333333 0.0000000 0.6666667
So the diagonal will always have value 1.00.