0

I am trying to find common edges between coexpression networks of genes. Here is a toy example:

Dataset 1    Dataset 2    Dataset 3
A:B          A:B          A:B
D:E          NA           D:E

So by intersecting these columns, A:B is an edge to be included, but not D:E.

My issue comes in that my edges can be represented either way round: either A:B or B:A. I also have A and B as separate columns. So any one data frame will look something like this:

Gene1    Gene2    Edge
A        B        A:B

or this:

Gene1    Gene2    Edge
B        A        B:A

This means when trying to intersect you could get something like the following:

Dataset 1    Dataset 2    Dataset 3    Dataset 4    Dataset5
B:A          A:B          A:B          B:A          A:B

Matching strings wouldn't work as they would be considered different, even though the relationship is still the same

How do I subset a dataframe that allows me to find a gene pair regardless of the order of the gene? Either by querying the string "gene1:gene2" or using the column with Gene1 names and the column with Gene2 names.

Claire
  • 301
  • 1
  • 2
  • 9

2 Answers2

0

I don't know if the following puts you close to what you need, but it does solve the problem of matching the strings.

Dataset1 <- data.frame(Edge = c("A:B", "D:E"))
Dataset2 <- data.frame(Edge = c("A:B", NA))
Dataset3 <- data.frame(Edge = c("A:B", "D:E"))

splitSort <- function(x, split = ":"){
  x <- as.character(x)
  x <- strsplit(x, split)
  x <- lapply(x, function(y) paste(sort(y), collapse = split))
  unlist(x)
}

e1 <- splitSort(Dataset1$Edge)
e2 <- splitSort(Dataset2$Edge)
e3 <- splitSort(Dataset3$Edge)
r <- Reduce(function(x, y) intersect(x, y), list(e1, e2, e3))

i <- which(Dataset2$Edge %in% r)
Dataset2[i, , drop = FALSE]
#  Edge
#1  A:B
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

I have no clue what you want. Here is my try. Maybe it helps you if you just order you genes the same way.

df1 <- 
    structure(list(Dataset1 = c("B:A", "E:A"), Dataset2 = c("A:B", 
                                                            "A:E"), Dataset3 = c("A:B", "A:B"), Dataset4 = c("B:A", "E:A"
                                                            ), Dataset5 = c("A:B", "B:A")), row.names = c(NA, -2L), class = "data.frame")
#      Dataset1 Dataset2 Dataset3 Dataset4 Dataset5
#1      B:A      A:B      A:B      B:A      A:B
#2      E:A      A:E      A:B      E:A      B:A

library(magrittr)
fun1 <- function(x) {
    strsplit(x,":") %>% lapply(sort) %>% lapply(paste0,collapse=":") %>% unlist
}

df1[] %<>% lapply(fun1)

df1
#  Dataset1 Dataset2 Dataset3 Dataset4 Dataset5
#1      A:B      A:B      A:B      A:B      A:B
#2      A:E      A:E      A:B      A:E      A:B
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69