3

I have a dataframe with g1, g2 representing two groups, and a val column, representing a count of how many items from g1 can also be found in g2.

## Input dataframe
data.frame(
  g1 = c('a','a','a','b','b','b','c','c','c','d'),
  g2 = c('a','b','c','a','b','c','a','b','c','d'),
  val = c(10,4,1,4,5,0,1,0,3,4),
  stringsAsFactors = FALSE
)

I'm having trouble formatting the dataframe as below. I can create an empty, named matrix with distinct column/row names from g1/g2, and iterate over each row in the input dataframe, writing it's value to the matching g1: row-id g2: column-id combination, but that seems inefficient; I was wondering if any of the libraries provided a method to automate it?

## Output overlap matrix
data.frame(a = c(10,4,1,0),
           b = c(4,5,0,0),
           c = c(1,0,3,0),
           d = c(0,0,0,4),
           row.names = c('a','b','c','d'))

A similar question to overlap between groups has been asked before, but there- we have a list of groups, and items in it, and want to find how many items overlap between groups.

Here, I know how many items overlap between groups, but am having trouble formatting it the correct way.

Ric S
  • 9,073
  • 3
  • 25
  • 51
CorerMaximus
  • 653
  • 5
  • 15
  • @tmfmnk I disagreed with your dupe-hammer. Please see my answer. If you still think it is a duplicate, please let me know. Cheers. – M-- Sep 09 '20 at 19:07
  • @M-- not a problem, it is really a different question. On the other hand, I'm sure that this was also answered repeatedly. – tmfmnk Sep 09 '20 at 19:15

4 Answers4

2

This is similar to How to reshape data from long to wide format with a slight difference that id column should be preserved as row names. For that we can use reshape2::acast.

reshape2::acast(Input_dataframe, formula = g1 ~ g2, fill = 0)

 # >    a b c d
 # > a 10 4 1 0
 # > b  4 5 0 0
 # > c  1 0 3 0
 # > d  0 0 0 4
M--
  • 25,431
  • 8
  • 61
  • 93
2

If you (or someone else) is looking for a tidyverse-solution, this can help

tidyr::pivot_wider(df, g1, names_from = g2, values_from = val, values_fill = 0) %>% 
  tibble::column_to_rownames("g1")
#    a b c d
# a 10 4 1 0
# b  4 5 0 0
# c  1 0 3 0
# d  0 0 0 4
Ric S
  • 9,073
  • 3
  • 25
  • 51
2

In base R, we can use xtabs

xtabs(val ~ g1 + g2, df1)
#   g2
#g1   a  b  c  d
#  a 10  4  1  0
#  b  4  5  0  0
#  c  1  0  3  0
#  d  0  0  0  4
akrun
  • 874,273
  • 37
  • 540
  • 662
1

This solution uses the data.table library.

dcast(setDT(df), g1 ~ g2, value.var = "val", fill = 0)
CorerMaximus
  • 653
  • 5
  • 15
  • If this is the answer you were chasing then the question is a duplicate. This doesn't give `g1` as rownames like how you've shown in the desired output above. Please try to ask your questions clearly and precisely. – M-- Sep 14 '20 at 17:46
  • It got close enough to where I wanted it to go. The next step was just renaming the rows with the values in the first column, and then deleting the first column. Apologies, I should have included that step in the answer. – CorerMaximus Oct 01 '20 at 04:11