Create an overlap matrix from group1, group2, overlap_count?

Question

I have a dataframe with g1, g2 representing two groups, and a val column, representing a count of how many items from g1 can also be found in g2.

## Input dataframe
data.frame(
  g1 = c('a','a','a','b','b','b','c','c','c','d'),
  g2 = c('a','b','c','a','b','c','a','b','c','d'),
  val = c(10,4,1,4,5,0,1,0,3,4),
  stringsAsFactors = FALSE
)

I'm having trouble formatting the dataframe as below. I can create an empty, named matrix with distinct column/row names from g1/g2, and iterate over each row in the input dataframe, writing it's value to the matching g1: row-id g2: column-id combination, but that seems inefficient; I was wondering if any of the libraries provided a method to automate it?

## Output overlap matrix
data.frame(a = c(10,4,1,0),
           b = c(4,5,0,0),
           c = c(1,0,3,0),
           d = c(0,0,0,4),
           row.names = c('a','b','c','d'))

A similar question to overlap between groups has been asked before, but there- we have a list of groups, and items in it, and want to find how many items overlap between groups.

Here, I know how many items overlap between groups, but am having trouble formatting it the correct way.

@tmfmnk I disagreed with your dupe-hammer. Please see my answer. If you still think it is a duplicate, please let me know. Cheers. — M--, Sep 09 '20 at 19:07
@M-- not a problem, it is really a different question. On the other hand, I'm sure that this was also answered repeatedly. — tmfmnk, Sep 09 '20 at 19:15

score 2 · Answer 1 · answered Sep 09 '20 at 19:05

This is similar to How to reshape data from long to wide format with a slight difference that id column should be preserved as row names. For that we can use reshape2::acast.

reshape2::acast(Input_dataframe, formula = g1 ~ g2, fill = 0)

 # >    a b c d
 # > a 10 4 1 0
 # > b  4 5 0 0
 # > c  1 0 3 0
 # > d  0 0 0 4

score 2 · Answer 2 · answered Sep 09 '20 at 19:10

If you (or someone else) is looking for a tidyverse-solution, this can help

tidyr::pivot_wider(df, g1, names_from = g2, values_from = val, values_fill = 0) %>% 
  tibble::column_to_rownames("g1")
#    a b c d
# a 10 4 1 0
# b  4 5 0 0
# c  1 0 3 0
# d  0 0 0 4

score 2 · Answer 3 · answered Sep 09 '20 at 22:14

2

In base R, we can use xtabs

xtabs(val ~ g1 + g2, df1)
#   g2
#g1   a  b  c  d
#  a 10  4  1  0
#  b  4  5  0  0
#  c  1  0  3  0
#  d  0  0  0  4

answered Sep 09 '20 at 22:14

akrun

874,273
37
540
662

CorerMaximus · Accepted Answer · 2020-09-09T23:04:37.780

1

This solution uses the data.table library.

dcast(setDT(df), g1 ~ g2, value.var = "val", fill = 0)

edited Sep 09 '20 at 23:04

answered Sep 09 '20 at 19:19

CorerMaximus

653
5
15

If this is the answer you were chasing then the question is a duplicate. This doesn't give `g1` as rownames like how you've shown in the desired output above. Please try to ask your questions clearly and precisely. – M-- Sep 14 '20 at 17:46
It got close enough to where I wanted it to go. The next step was just renaming the rows with the values in the first column, and then deleting the first column. Apologies, I should have included that step in the answer. – CorerMaximus Oct 01 '20 at 04:11

Create an overlap matrix from group1, group2, overlap_count?

4 Answers4