Finding Number of Groups that Contain Specific Pairs in Data Frame

Question

I am trying to find the number of groups in a data frame that contain specific pairs. Here is an example of what I have done and the desired output.

Creating the data

df=data.frame(c("Sam","Sam","Sam","Jason", "Jason", "Kelly", "Kelly"),
c("e","f","g","h", "h", "e", "f"))

names(df)=c('name','value')

Not interested in looking at the pairs which do not occur within at least one specific name, so I drop those observations before generating the pairs

df=df[!duplicated(df[1:2]),]

df=df[ave(rep(1, nrow(df)), df$name, FUN=length)>1,]

pairs=t(combn(unique(df$value), 2))

Now I have two objects that look like this

   name value
1   Sam     e
2   Sam     f
3   Sam     g
6 Kelly     e
7 Kelly     f

     [,1] [,2]
[1,] e    f   
[2,] e    g   
[3,] f    g

My Desired Output

   pair.1    pair.2  occurrences
1   e          f         2
2   e          g         1
3   f          g         1

Take a look into the `igraph` package. I'm sure you can find some dupes around here. And please don't embed into your MWE lines that mess up with ones environment. — David Arenburg, Dec 29 '15 at 16:35
[This post](http://stackoverflow.com/questions/19891278/r-table-of-interactions-case-with-pets-and-houses) should be helpful for start; and adding a `merge`: `merge(data.frame(val1 = pairs[, 1L], val2 = pairs[, 2L]), setNames(as.data.frame(as.table(crossprod(table(df)))), c("val1", "val2", "freq")))` — alexis_laz, Dec 29 '15 at 18:51

akrun · Accepted Answer · 2015-12-29T16:38:12.550

We merge the dataset with itself by 'name', sort the 'value' columns by 'row', convert the dataset to data.table, remove the rows with the same 'value' elements, grouped by the 'value' columns, get the nrow (.N) and divide by 2.

d1 <- merge(df, df, by.x='name', by.y='name')
d1[-1] <- t(apply(d1[-1], 1, sort))
library(data.table)
setDT(d1)[value.x!=value.y][,.N/2 ,.(value.x, value.y)]
#   value.x value.y V1
#1:       e       f  2
#2:       e       g  1
#3:       f       g  1

Or using a similar method as in @jeremycg's post

 setDT(df)[df, on='name', allow.cartesian=TRUE
     ][as.character(value)< as.character(i.value), .N, .(value, i.value)]

score 3 · Answer 2 · answered Dec 29 '15 at 16:31

Here's an answer using dplyr. See the comments inline for explanation:

library(dplyr) #load dplyr
df %>% #your data
 left_join(df, by = "name") %>% #merge against your own data
 filter(as.character(value.x) < as.character(value.y)) %>% #filter out any where the two are equal, and make sure we only have one of each pair
 group_by(value.x, value.y) %>% #group by the two vars
 summarise(n()) #count them

Source: local data frame [3 x 3]
Groups: value.x [?]

  value.x value.y   n()
   (fctr)  (fctr) (int)
1       e       f     2
2       e       g     1
3       f       g     1

Finding Number of Groups that Contain Specific Pairs in Data Frame

Creating the data

Not interested in looking at the pairs which do not occur within at least one specific name, so I drop those observations before generating the pairs

Now I have two objects that look like this

My Desired Output

2 Answers2