2

I am trying to find the number of groups in a data frame that contain specific pairs. Here is an example of what I have done and the desired output.

Creating the data

df=data.frame(c("Sam","Sam","Sam","Jason", "Jason", "Kelly", "Kelly"),
c("e","f","g","h", "h", "e", "f"))

names(df)=c('name','value')

Not interested in looking at the pairs which do not occur within at least one specific name, so I drop those observations before generating the pairs

df=df[!duplicated(df[1:2]),]

df=df[ave(rep(1, nrow(df)), df$name, FUN=length)>1,]

pairs=t(combn(unique(df$value), 2))

Now I have two objects that look like this

   name value
1   Sam     e
2   Sam     f
3   Sam     g
6 Kelly     e
7 Kelly     f

     [,1] [,2]
[1,] e    f   
[2,] e    g   
[3,] f    g  

My Desired Output

   pair.1    pair.2  occurrences
1   e          f         2
2   e          g         1
3   f          g         1
Justin Klevs
  • 651
  • 6
  • 17
  • 1
    Take a look into the `igraph` package. I'm sure you can find some dupes around here. And please don't embed into your MWE lines that mess up with ones environment. – David Arenburg Dec 29 '15 at 16:35
  • [This post](http://stackoverflow.com/questions/19891278/r-table-of-interactions-case-with-pets-and-houses) should be helpful for start; and adding a `merge`: `merge(data.frame(val1 = pairs[, 1L], val2 = pairs[, 2L]), setNames(as.data.frame(as.table(crossprod(table(df)))), c("val1", "val2", "freq")))` – alexis_laz Dec 29 '15 at 18:51

2 Answers2

4

We merge the dataset with itself by 'name', sort the 'value' columns by 'row', convert the dataset to data.table, remove the rows with the same 'value' elements, grouped by the 'value' columns, get the nrow (.N) and divide by 2.

d1 <- merge(df, df, by.x='name', by.y='name')
d1[-1] <- t(apply(d1[-1], 1, sort))
library(data.table)
setDT(d1)[value.x!=value.y][,.N/2 ,.(value.x, value.y)]
#   value.x value.y V1
#1:       e       f  2
#2:       e       g  1
#3:       f       g  1

Or using a similar method as in @jeremycg's post

 setDT(df)[df, on='name', allow.cartesian=TRUE
     ][as.character(value)< as.character(i.value), .N, .(value, i.value)]
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Here's an answer using dplyr. See the comments inline for explanation:

library(dplyr) #load dplyr
df %>% #your data
 left_join(df, by = "name") %>% #merge against your own data
 filter(as.character(value.x) < as.character(value.y)) %>% #filter out any where the two are equal, and make sure we only have one of each pair
 group_by(value.x, value.y) %>% #group by the two vars
 summarise(n()) #count them

Source: local data frame [3 x 3]
Groups: value.x [?]

  value.x value.y   n()
   (fctr)  (fctr) (int)
1       e       f     2
2       e       g     1
3       f       g     1
jeremycg
  • 24,657
  • 5
  • 63
  • 74