1

I have a data frame which looks like this:

Fruit    Colour
Apple      Red
Apple      Green   
Cherry     Red
Lemon      Yellow 
Banana     Yellow
Blueberry  Purple
Grapes     Purple
Grapes     Green

And would like a matrix which looks like this:

           Apple    Cherry    Lemon    Banana    Blueberry    Grapes
Apple       0         1         0         0         0            1            
Cherry      1         0         0         0         0            0
Lemon       0         0         0         1         0            0
Banana      0         0         1         0         0            0
Blueberry   0         0         0         0         0            1         
Grapes      1         0         0         0         1            0

Corresponding to a the number of shared values between the rows in the colour column.

I've tried something like this:

df1 <- dcast(fruit_frame, Fruit~Colour)

Which gives me a dataframe with the colour as the columns and fruit as rows and number of occurrences of each colour but it isn't quite what I am looking for. Is there an easy way to do this in R or python?

Thanks you in advance.

1 Answers1

3

An option in R would be

out <- tcrossprod(table(fruit_frame))
diag(out) <- 0
akrun
  • 874,273
  • 37
  • 540
  • 662