Transform observation count into crosstab

Question

I have data that looks like this

  Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1

And i want to transform it to a crosstable that looks like this

        Cluster.y
Cluster.x  1  2  3  4  5  7
        1  0  2  0  8  0  1
        2  2  0  0  4  0  1
        3  0  0  0  0  0  0
        4  8  4  0  0  0  0
        5  0  0  0  0  0  0
        7  1  1  0  0  0  0

How can i do this transformation? It is important for me, that the table is symmetric and is readable as rows or columns with the same results.

score 4 · Accepted Answer · answered Oct 21 '22 at 11:06

4

Firstly, create a matrix with zeros,

out <- matrix(0, 7,7)

Then assign your values into them by flipping columns,

out[as.matrix(df[,1:2])]  <- df[,3]
out[as.matrix(df[,2:1])]  <- df[,3]

gives,

#         [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#    [1,]    0    2    0    8    0    0    1
#    [2,]    2    0    0    4    0    0    1
#    [3,]    0    0    0    0    0    0    0
#    [4,]    8    4    0    0    0    0    0
#    [5,]    0    0    0    0    0    0    0
#    [6,]    0    0    0    0    0    0    0
#    [7,]    1    1    0    0    0    0    0

Data:

df <- read.table(text="
 Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1",header=T)

answered Oct 21 '22 at 11:06

maydin

3,715
3
10
27

Thanks for the input! It almost works, unfortunately, for some reason the matrix is not symmetric. I dont quite get why, since it seems to work for you. here is my dput: structure(c(0, 2, 0, 8, 0, 0, 1, 2, 0, 0, 4, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 3, 3, 0, 11, 3, 0, 4, 0, 0, 0, 7, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 12, 1, 0, 2), dim = c(7L, 7L)) – craszer Oct 21 '22 at 12:13
EDIT: With my data, [7,1] = 2 and [1,7] = 1. My goal is to have them both be the combined value 3. – craszer Oct 21 '22 at 12:21
1

with ``` out <- out + t(out)``` it seems to work :) – craszer Oct 21 '22 at 12:24

Rui Barradas · Answer 2 · 2022-10-21T16:48:51.083

Here is a way. Get the unique elements in both cluster columns, coerce those columns to factor with the complete sequence as levels and tabulate.

To force a symmetric output, the function symMatrix inspired in this comment(*) works if the entries to be changed are zeros.

(*) See also the benchmark, answers are more permanent than comments.

df1 <- 'Cluster.x Cluster.y  n
         1         4   8
         1         7   1
         2         1   2
         2         4   4
         2         7   1'
df1 <- read.table(textConnection(df1), header = TRUE)

cols <- grep("Cluster", names(df1))
u <- unique(unlist(df1[cols]))
df1[cols] <- lapply(df1[cols], \(x) factor(x, levels = Reduce(`:`, range(u))))

symMatrix <- function(x) x + t(x)

xt <- xtabs(n ~ ., df1)
xt
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 0 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 0 0 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 0 0 0 0 0 0 0

symMatrix(xt)
#>          Cluster.y
#> Cluster.x 1 2 3 4 5 6 7
#>         1 0 2 0 8 0 0 1
#>         2 2 0 0 4 0 0 1
#>         3 0 0 0 0 0 0 0
#>         4 8 4 0 0 0 0 0
#>         5 0 0 0 0 0 0 0
#>         6 0 0 0 0 0 0 0
#>         7 1 1 0 0 0 0 0

^{Created on 2022-10-21 with reprex v2.0.2}

Thanks for the input! However, with symmetric i mean (maybe its not the right word?) that [7,1] = [1,7]. Any ideas on how to do that? — craszer, Oct 21 '22 at 12:22

Merijn van Tilborg · Answer 3 · 2022-10-21T13:00:24.313

2

library(Matrix)

dims <- max(dt$Cluster.x, dt$Cluster.y)

m <- sparseMatrix(i = dt$Cluster.x, j = dt$Cluster.y, x = dt$n, dims = c(dims, dims))

matrix(m, nrow = dims, ncol = dims, dimnames = list(1:dims, 1:dims))

#   1 2 3 4 5 6 7
# 1 0 0 0 8 0 0 1
# 2 2 0 0 4 0 0 1
# 3 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0

Edit

You can get the symetric one rowbinding the reverse values and remove duplicate rows. I use data.table here.

dt <- unique(data.table::rbindlist(list(
  dt, dt[, .(Cluster.x = Cluster.y, Cluster.y = Cluster.x, n = n)]
)))

This will result in:

  1 2 3 4 5 6 7
1 0 2 0 8 0 0 1
2 2 0 0 4 0 0 1
3 0 0 0 0 0 0 0
4 8 4 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 1 1 0 0 0 0 0

edited Oct 21 '22 at 13:00

answered Oct 21 '22 at 11:50

Merijn van Tilborg

5,452
1
7
22

Thanks for the solution! However, also here the matrix does not turn out symmetric in a way that I mean. For example, [7,1] is 0 and [1,7] is 1. My goal is to have them both be the sum of all occurances between 7 and 1, so in the case of your example solution both being 1 – craszer Oct 21 '22 at 12:18
`m <- sparseMatrix(i = c(dt$Cluster.x, dt$Cluster.y), j = c(dt$Cluster.y, dt$Cluster.x), x = rep(dt$n, 2), dims = c(dims, dims))` could work BUT it doubles n IF cluster.x = cluster.y in your original data – Merijn van Tilborg Oct 21 '22 at 12:40
or prepare your dt first, see my updated answer – Merijn van Tilborg Oct 21 '22 at 13:04

Transform observation count into crosstab

3 Answers3