This is a toy example of my dataframe:
m <- matrix(c(rep(1,3),rep(2,3),rep(3,4),rep(2,3),rep(2,3),rep(3,4)),
ncol = 2,nrow = 10)
colnames(m)<-c("setID","objID")
> m
setID objID
[1,] 1 2
[2,] 1 2
[3,] 1 2
[4,] 2 2
[5,] 2 2
[6,] 2 2
[7,] 3 3
[8,] 3 3
[9,] 3 3
[10,] 3 3
What I would like to do is get the percentage of how many objID
are shared between my different setID
.
In this toy example, setID 1 and setID 2 share 100% of their objID
and setID
3 doesn't share any objID
with any other setID
.
The problem is that I have over 2000 setID
and that would be 2000choose2 different possible combinations. I was trying to do this with a for loop, but I imagine there has to be a faster way.
I also checked other post about this, but the only one I found was about finding common rows between just two dataframes.