1

I have data that looks like this

  Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1

And i want to transform it to a crosstable that looks like this

        Cluster.y
Cluster.x  1  2  3  4  5  7
        1  0  2  0  8  0  1
        2  2  0  0  4  0  1
        3  0  0  0  0  0  0
        4  8  4  0  0  0  0
        5  0  0  0  0  0  0
        7  1  1  0  0  0  0

How can i do this transformation? It is important for me, that the table is symmetric and is readable as rows or columns with the same results.

craszer
  • 121
  • 7

3 Answers3

4

Firstly, create a matrix with zeros,

out <- matrix(0, 7,7)

Then assign your values into them by flipping columns,

out[as.matrix(df[,1:2])]  <- df[,3]
out[as.matrix(df[,2:1])]  <- df[,3]

gives,

#         [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#    [1,]    0    2    0    8    0    0    1
#    [2,]    2    0    0    4    0    0    1
#    [3,]    0    0    0    0    0    0    0
#    [4,]    8    4    0    0    0    0    0
#    [5,]    0    0    0    0    0    0    0
#    [6,]    0    0    0    0    0    0    0
#    [7,]    1    1    0    0    0    0    0

Data:

df <- read.table(text="
 Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1",header=T)
maydin
  • 3,715
  • 3
  • 10
  • 27
  • Thanks for the input! It almost works, unfortunately, for some reason the matrix is not symmetric. I dont quite get why, since it seems to work for you. here is my dput: structure(c(0, 2, 0, 8, 0, 0, 1, 2, 0, 0, 4, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 3, 3, 0, 11, 3, 0, 4, 0, 0, 0, 7, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 12, 1, 0, 2), dim = c(7L, 7L)) – craszer Oct 21 '22 at 12:13
  • EDIT: With my data, [7,1] = 2 and [1,7] = 1. My goal is to have them both be the combined value 3. – craszer Oct 21 '22 at 12:21
  • 1
    with ``` out <- out + t(out)``` it seems to work :) – craszer Oct 21 '22 at 12:24
2

Here is a way. Get the unique elements in both cluster columns, coerce those columns to factor with the complete sequence as levels and tabulate.

To force a symmetric output, the function symMatrix inspired in this comment(*) works if the entries to be changed are zeros.

(*) See also the benchmark, answers are more permanent than comments.

df1 <- 'Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1'
df1 <- read.table(textConnection(df1), header = TRUE)

cols <- grep("Cluster", names(df1))
u <- unique(unlist(df1[cols]))
df1[cols] <- lapply(df1[cols], \(x) factor(x, levels = Reduce(`:`, range(u))))

symMatrix <- function(x) x + t(x)

xt <- xtabs(n ~ ., df1)
xt
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 0 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 0 0 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 0 0 0 0 0 0 0

symMatrix(xt)
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 2 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 8 4 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 1 1 0 0 0 0 0

Created on 2022-10-21 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
2
library(Matrix)

dims <- max(dt$Cluster.x, dt$Cluster.y)

m <- sparseMatrix(i = dt$Cluster.x, j = dt$Cluster.y, x = dt$n, dims = c(dims, dims))

matrix(m, nrow = dims, ncol = dims, dimnames = list(1:dims, 1:dims))

#   1 2 3 4 5 6 7
# 1 0 0 0 8 0 0 1
# 2 2 0 0 4 0 0 1
# 3 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0

Edit

You can get the symetric one rowbinding the reverse values and remove duplicate rows. I use data.table here.

dt <- unique(data.table::rbindlist(list(
  dt, dt[, .(Cluster.x = Cluster.y, Cluster.y = Cluster.x, n = n)]
)))

This will result in:

  1 2 3 4 5 6 7
1 0 2 0 8 0 0 1
2 2 0 0 4 0 0 1
3 0 0 0 0 0 0 0
4 8 4 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 1 1 0 0 0 0 0
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • Thanks for the solution! However, also here the matrix does not turn out symmetric in a way that I mean. For example, [7,1] is 0 and [1,7] is 1. My goal is to have them both be the sum of all occurances between 7 and 1, so in the case of your example solution both being 1 – craszer Oct 21 '22 at 12:18
  • `m <- sparseMatrix(i = c(dt$Cluster.x, dt$Cluster.y), j = c(dt$Cluster.y, dt$Cluster.x), x = rep(dt$n, 2), dims = c(dims, dims))` could work BUT it doubles n IF cluster.x = cluster.y in your original data – Merijn van Tilborg Oct 21 '22 at 12:40
  • or prepare your dt first, see my updated answer – Merijn van Tilborg Oct 21 '22 at 13:04