2

Everyone, I have this binary matrix

rownames <- c("gene1", "gene2", "gene3", "gene4")
colnames <- c("A", "B", "C", "D")

data <- matrix(c(1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1),nrow = 4, ncol = 4, byrow = TRUE, dimnames = list(rownames, colnames))

data
#      A B C D
#gene1 1 0 1 0
#gene2 1 1 0 0
#gene3 1 0 1 0
#gene4 0 1 0 1

I would like to turn data into this Adjacency matrix for network igraph visualization.

   A B C D
A  0 1 2 0
B  1 0 0 1
C  2 0 0 0
D  0 1 0 0

Explanation: from A, gene2 has shared score "1" with B, that is why I gets 1 score. from A, gene1 and gene3 has shared score "total 2" with C, that is why it gets 2 score

Is it possible to change matrix into this? Should I use iteration? but I have no idea. Tidyverse approach will be very helpful.

Thank you

Henrik
  • 65,555
  • 14
  • 143
  • 159
choij
  • 227
  • 1
  • 7
  • 1
    [Create a co-occurrence matrix from dummy-coded observations](https://stackoverflow.com/questions/10622730/create-a-co-occurrence-matrix-from-dummy-coded-observations), but you can skip the `as.matrix` step, so simply: `\`diag<-\`(crossprod(data), 0)` – Henrik Mar 07 '21 at 11:47
  • @Henrik, how ` `diag<-` ` works here? – AnilGoyal Mar 07 '21 at 12:04
  • 1
    It's the replacement version of `diag` (`diag(x) <- value`) written in a functional form, as any replacement function could be. – Henrik Mar 07 '21 at 12:06

3 Answers3

3

You may make use of outer function.

count1s <- function(x, y) colSums(x == 1 & y == 1)
n <- 1:ncol(data)
mat <- outer(n, n, function(x, y) count1s(data[, x], data[, y]))
diag(mat) <- 0
dimnames(mat) <- list(colnames(data), colnames(data))
mat

#  A B C D
#A 0 1 2 0
#B 1 0 0 1
#C 2 0 0 0
#D 0 1 0 0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

A tidyverse approach - welcome any feedback on make it more concise

library(tidyverse)

data_tibble <- as_tibble(data)
calculate_adjacency <- function(data, col_name) {
  column_names <- names(data)
  data %>%
    filter(!!sym(col_name) ==1) %>%
    select(!matches(col_name)) %>%
    pivot_longer(cols = everything(), names_to = "name", values_to = "value") %>%
    group_by(name) %>%
    summarize(value = sum(value)) %>%
    pivot_wider(names_from = name, values_from = value) %>%
    mutate("{col_name}" := 0) %>%
    select(all_of(column_names))
}

map_df(colnames(data_tibble), calculate_adjacency, data = data_tibble) %>%
  mutate(key = colnames(.)) %>%
  relocate(key, .before = 1)

# A tibble: 4 x 5
  key       A     B     C     D
  <chr> <dbl> <dbl> <dbl> <dbl>
1 A         0     1     2     0
2 B         1     0     0     1
3 C         2     0     0     0
4 D         0     1     0     0
Sinh Nguyen
  • 4,277
  • 3
  • 18
  • 26
2

An easier tidyverse approach

data2 <- as.data.frame(data) %>% rownames_to_column("id") %>%
  pivot_longer(cols = -id) %>% filter(value != 0)

merge(data2, data2, by = "id", all = T) %>%
  filter(name.x != name.y) %>%
  group_by(name.x, name.y) %>%
  summarise(val = n()) %>%
  pivot_wider(names_from = name.y, values_from = val, values_fill = 0, names_sort = T) %>%
  column_to_rownames("name.x")

  A B C D
A 0 1 2 0
B 1 0 0 1
C 2 0 0 0
D 0 1 0 0

A more easier approach would be

data2 <- as.data.frame(data) %>% rownames_to_column("id") %>%
  pivot_longer(cols = -id) %>% filter(value != 0)

inter <- crossprod(xtabs(~id+name, data2), xtabs(~id+name, data2))
diag(inter) <- 0
inter

    name
name A B C D
   A 0 1 2 0
   B 1 0 0 1
   C 2 0 0 0
   D 0 1 0 0

I realised after linking the question that easiest is

inter <- crossprod(data)
diag(inter) <- 0
inter
  A B C D
A 0 1 2 0
B 1 0 0 1
C 2 0 0 0
D 0 1 0 0
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45