0

I have a two column dataframe of two variables that are factors:

df

PLOT  INTERACTION 
 A    interact_type_1
 A    interact_type_2
 B    interact_type_3
 B    interact_type_4 
 C    interact_type_1
 D    interact_type_4
 E    interact_type_1
 E    interact_type_2
 E    interact_type_3
 E    interact_type_4

I need a pairwise matrix where nrows and mcolumns are the unique levels of Variable 1 (PLOTS). The matrix fill will include the counts of INTERACTION matches among each combination of the levels of PLOT. Since it is a similarity matrix there would be only 1/2 fill of the matrix, so the same PLOTS and 1/2 of the matrix will be filled with NAs. In this example, the output matrix would look like:

output


   A   B    C    D    E

A NA   NA   NA   NA   NA

B 0   NA    NA   NA   NA

C 1   0    NA    NA   NA

D 0   1    0    NA    NA

E 2   2    1    1     NA

I have tried changing it from long to wide format then using a loop:

 df<- spread(df, df$PLOT, df$INTERACTION) 


 similarity.matrix<-matrix(nrow=ncol(F.data),ncol=ncol(F.data))


 for( in 1:ncol(F.data)){
  matches<-F.data[,col]==F.data
  match.counts<-colSums(matches)
  match.counts[col]<-0 # Set the same column comparison to zero.
  similarity.matrix[,col]<-match.counts
   }  

but I get an error with the first line stating Error: Invalid column specification.

I appreciate your time and help! Thank you.

Uwe
  • 41,420
  • 11
  • 90
  • 134
Danielle
  • 785
  • 7
  • 15
  • 2
    See a similar post [here](https://stackoverflow.com/questions/19891278/table-of-interactions-case-with-pets-and-houses) -- `tcrossprod(table(df))` – alexis_laz May 25 '17 at 09:17
  • Possible duplicate of [Table of Interactions - Case with pets and houses](https://stackoverflow.com/questions/19891278/table-of-interactions-case-with-pets-and-houses) – Uwe May 26 '17 at 05:53

1 Answers1

1

You could do it this way:

x = xtabs(~PLOT+INTERACTION,d)
        INTERACTION
    PLOT interact_type_1 interact_type_2 interact_type_3 interact_type_4
       A               1               1               0               0
       B               0               0               1               1
       C               1               0               0               0
       D               0               0               0               1
       E               1               1               1               1

Find the combinations of two among PLOT using combn:

n = length(unique(d$PLOT))
c = combn(1:n,2)

Then construct your matrix and fill its lower half:

m = matrix(nrow=n,ncol=n)
## for each possible combination of two present in c, we find for the corresponding rows in x how many 1s they have in common using sum(x[y[1],]*x[y[2],])
m[lower.tri(m)] = apply(c,2,function(y) sum(x[y[1],]*x[y[2],]))

This returns:

      [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]    0   NA   NA   NA   NA
[3,]    1    0   NA   NA   NA
[4,]    0    1    0   NA   NA
[5,]    2    2    1    1   NA
Lamia
  • 3,845
  • 1
  • 12
  • 19