1

I have a table with a list of observations associated with different groups.

Animal  Sector  Time    Group
Cat 1   Night   A
Cat 1   Night   B
Cat 2   Night   B
Bat 2   Night   A
Bat 3   Night   C
Bat 3   Night   A
Bat 3   Night   B
Mouse   1   Day B
Mouse   2   Night   A
Mouse   2   Night   B
Deer    2   Day A
Deer    2   Night   B
Deer    2   Night   C

I cound Animal + Sector + Time combined as an observation. There are no duplicate observations within Groups but there are many between groups in the full dataset. I would like to have pairwise matrix of how many duplicate observations are made between groups. In the example above the pairwise identical observations between groups would be:

Groups A + B:
Cat   1   Night
Bat 3   Night
Mouse   2   Night

Groups A + C:
Bat 3   Night

Groups B + C:
Bat 3   Night
Deer    2   Night

(in Group A and Group B)

The closest I have is this code, it doesn't create a pairwise matrix instead it lists the shared observations:

df %>% 
  group_by(Animal, Sector, Time) %>% 
  summarise(
    samples = paste(unique(Group), collapse = ""), 
    n = length(unique(Group)))

I'm more interested in the number of shared observations between Groups rather than the exact identify of the observations.

If anyone can give me suggestions for how to do this in dplyr or base R that would be very helpful.

Ultimately the goal is to visualise it with a pairwise matrix where each tile gives the number of shared observations between 2 Groups. I tried to make a heatmap but I'd prefer a pairwise matrix:

df$observations <- paste(df$Animal,df$Sector,df$Time)
dfpw <- table(df[,c("Group","observations")])
counts <- apply(dfpw,2,sum)
dfpw_shared <- tt[,which(counts>=2)] # shared by at least two groups
heatmap(dfpw_shared ,scale="none")

This current visualisation has the identify of the observations on the X axis and the Groups on the Y axis. I'd prefer the Groups on the X and Y axis and the counts of observations shared in the tiles.

I'd prefer if the visualisation showed a pairwise matrix with the number of counts shared in the tiles (including tiles with 0 shared observations between groups).

Thanks in advance for any help.

user964689
  • 812
  • 7
  • 20
  • 40
  • Isn't it just `df %>% count(Animal, Sector, Time)`? – Rui Barradas Apr 29 '20 at 16:02
  • I appreciate minimal examples, but I think you should make this one a little more complex. As far as I can tell the desired output is a heatmap with only one non-zero value. Having 2 different non-zero values would keep the example simple but provide a little more depth. – Gregor Thomas Apr 29 '20 at 16:10
  • Thanks I edited the example to be more informative. – user964689 Apr 30 '20 at 08:03
  • I don't think that is what I am acter Rui Barradas, I don't think that summarises how many observations are shared between any 2 groups, unless I misunderstand the output? – user964689 Apr 30 '20 at 08:05

1 Answers1

1

I am not sure if this is what you are looking for. Below is a possible solution with base R:

sharedObs <- function(v) {
  p <- do.call(paste,subset(dfs[[v[1]]],select = -Group))
  q <- do.call(paste,subset(dfs[[v[2]]],select = -Group))
  length(intersect(p,q))
}

dfs <- split(df,df$Group)
n <- length(dfs)
mat <- `dimnames<-`(matrix(0,n,n),list(names(dfs),names(dfs)))
mat[lower.tri(mat,diag = FALSE)] <- combn(n,2,sharedObs)
res <- t(mat) + mat

heatmap(res,scale="none")

which gives

> res
  A B C
A 0 1 0
B 1 0 0
C 0 0 0

and the heat map

enter image description here

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • This looks like it is working, is there a way to put the number of shared variants in each tile the heatmap? – user964689 Apr 30 '20 at 07:57
  • @user964689 I guess it is possible to do as you described, but I am not very familiar with `heatmap`, sorry. Maybe you can post a new question regarding that issue and guys who know it could easily come to it – ThomasIsCoding Apr 30 '20 at 08:34
  • @user964689 I found this one, maybe it's what you need https://stackoverflow.com/a/14290705/12158757 – ThomasIsCoding Apr 30 '20 at 09:25