1

I have a data.frame with each row representing a value for the pairwise combination of n samples. How could I expand this to essentially get an 'expand.grid' of each combination, but maintaining the value for the given pair and giving, say, a value of 1 for rows with identical samples? My aim is to plot these in a 'correlation matrix' style plot (e.g. plots). Not sure if there's an easier way to do it.

set.seed(123)
n <- 3
d <- as.data.frame(t(combn(letters[1:n], m = 2)), stringsAsFactors = FALSE)
d$value <- rnorm(nrow(d))
d
##   V1 V2      value
## 1  a  b -0.5604756
## 2  a  c -0.2301775
## 3  b  c  1.5587083

e <- expand.grid(letters[1:n], letters[1:n])
#e$value <- ?? 
# a-a, b-b, c-c will be e.g. 1
# a-b and b-a will be -0.5604
# a-c and c-a will be -0.2301
# b-c and c-b will be  1.5587

e
##   Var1 Var2
## 1    a    a
## 2    b    a
## 3    c    a
## 4    a    b
## 5    b    b
## 6    c    b
## 7    a    c
## 8    b    c
## 9    c    c
Community
  • 1
  • 1
PeterQ
  • 107
  • 1
  • 6

1 Answers1

2

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(d,..) set the 'key' columns, cross join, for rows that have same values in 'V1' and 'V2' (V1==V2), set the 'value' as 1. Group by the pmax of 'V1', 'V2', and pmin of the same columns, replace the 'value' as the non-NA value in 'value'.

library(data.table)
d1 <- setDT(d, key = c("V1", "V2"))[CJ(letters[1:n], letters[1:n])][
 V1==V2, value:= 1][, value:= na.omit(value) , .(pmax(V1, V2), pmin(V1, V2))][]
d1
#   V1 V2      value
#1:  a  a  1.0000000
#2:  a  b -0.5604756
#3:  a  c -0.2301775
#4:  b  a -0.5604756
#5:  b  b  1.0000000
#6:  b  c  1.5587083
#7:  c  a -0.2301775
#8:  c  b  1.5587083
#9:  c  c  1.0000000
akrun
  • 874,273
  • 37
  • 540
  • 662