Number of rows shared between different elements in a single dataframe in R

Question

This is a toy example of my dataframe:

m <- matrix(c(rep(1,3),rep(2,3),rep(3,4),rep(2,3),rep(2,3),rep(3,4)),
            ncol = 2,nrow = 10)
colnames(m)<-c("setID","objID")

> m
      setID objID
 [1,]     1     2
 [2,]     1     2
 [3,]     1     2
 [4,]     2     2
 [5,]     2     2
 [6,]     2     2
 [7,]     3     3
 [8,]     3     3
 [9,]     3     3
[10,]     3     3

What I would like to do is get the percentage of how many objID are shared between my different setID. In this toy example, setID 1 and setID 2 share 100% of their objID and setID 3 doesn't share any objID with any other setID.

The problem is that I have over 2000 setID and that would be 2000choose2 different possible combinations. I was trying to do this with a for loop, but I imagine there has to be a faster way.

I also checked other post about this, but the only one I found was about finding common rows between just two dataframes.

Your `data.frame` is a matrix according to the code you have posted. It is also unclear what the expected output should look like? — mtoto, Apr 28 '16 at 10:06
See [this post](http://stackoverflow.com/questions/19891278/r-table-of-interactions-case-with-pets-and-houses); not clear, but perhaps you might be looking for something like `tab = crossprod(table(m[, "objID"], m[, "setID"]) > 0L); (tab / diag(tab)) * 100`? — alexis_laz, Apr 28 '16 at 10:17
@alexis_laz When I use `table(m[,"objID"],m[,"objID"])` I don't get the frequency for the setID 1. Why is that? or am I missing something? — GabrielMontenegro, Apr 28 '16 at 10:32
@JavierM88 : should be _table(objID, **setID**)_; is it, really, a typo or you mean something else? — alexis_laz, Apr 28 '16 at 10:41

Number of rows shared between different elements in a single dataframe in R

0 Answers0