1

I'm working on a project where participants will be sorting pictures into groups. My goal is to use R to create a table that would show a count of how many time pictures were sorted into the same group. The group names are arbitrary and will change between participants. For example, pictures 2 and 4 may be in group B for user 1, but in group C for user 2. I just want to know the number of times they were in the same group. Here's an example of what the data would look like:

ID  Pic1     Pic2    Pic3    Pic4
1   GroupA   GroupB  GroupA  GroupC
2   GroupB   GroupA  GroupB  GroupA
3   GroupC   GroupA  GroupB  GroupC

What I'd like as output is this:

       Pic 1    Pic2    Pic3    Pic4
Pic1            0       2       1  
Pic2   0                0       1
Pic3   2        0               0
Pic4   1        1       0

My guess is that dplyr could do this somehow, but I can't figure out how to count cooccurence by each user when the group names change. I can do this in Excel using VLookup, but I'd like to avoid that if possible. Any ideas?

doctorj75
  • 33
  • 4

1 Answers1

1

One option is to use outer and compare pairs of the columns with a vectorized comparison function:

outer(df[-1], df[-1], Vectorize(function(x, y) sum(x == y)))

#     Pic1 Pic2 Pic3 Pic4
#Pic1    3    0    2    1
#Pic2    0    3    0    1
#Pic3    2    0    3    0
#Pic4    1    1    0    3

Another longer version, you can use Map:

outer(df[-1], df[-1], function(x, y) Map(function(x, y) sum(x == y), x, y))
Psidom
  • 209,562
  • 33
  • 339
  • 356