How to find networks within a data frame?

Question

If I have a data frame with two ID columns A and B, where each observation represents an edge (connection between two ids), what is the best way to determine all the disjoint ID groups? IDs can be present in either column and repeated.

By way of example, here is a test data frame along with what I would expect as a result:

df <- data.frame(A = rep(1:5, 2), B = c(3, 7:15))
#    A  B
#    1  3
#    2  7
#    3  8
#    4  9
#    5 10
#    1 11
#    2 12
#    3 13
#    4 14
#    5 15

# Proposed results
# Each element of the list represents a unique group
# [[1]]
# [1]  1  3  8 11 13
# 
# [[2]]
# [1]  2  7 12
# 
# [[3]]
# [1]  4  9 14
# 
# [[4]]
# [1]  5 10 15

Very similar to this old question of mine - [identify groups of linked episodes which chain together](http://stackoverflow.com/questions/12135971/identify-groups-of-linked-episodes-which-chain-together) — thelatemail, Feb 02 '16 at 01:42

score 1 · Answer 1 · answered Feb 02 '16 at 00:01

Here's my proposed solution, which I find to be overwrought considering how the problem is relatively straightforward:

library(magrittr)

find_relationships <- function(known_nodes, d){
  # takes a vector of ids, known_nodes, and data consist of ids, d
  subset(d, A %in% known_nodes | B %in% known_nodes) %>%
    unlist %>%
    c(known_nodes) %>%
    unique -> new_nodes

  if(length(new_nodes) == length(known_nodes)){
    return(new_nodes)
  }
  else{
    Recall(new_nodes, d)
  }
}

unique_ids <- unique(c(df$A, df$B))

results <- lapply(unique_ids, find_relationships, d = df) %>% unique

How to find networks within a data frame?

1 Answers1