0

If I have a data frame with two ID columns A and B, where each observation represents an edge (connection between two ids), what is the best way to determine all the disjoint ID groups? IDs can be present in either column and repeated.

By way of example, here is a test data frame along with what I would expect as a result:

df <- data.frame(A = rep(1:5, 2), B = c(3, 7:15))
#    A  B
#    1  3
#    2  7
#    3  8
#    4  9
#    5 10
#    1 11
#    2 12
#    3 13
#    4 14
#    5 15

# Proposed results
# Each element of the list represents a unique group
# [[1]]
# [1]  1  3  8 11 13
# 
# [[2]]
# [1]  2  7 12
# 
# [[3]]
# [1]  4  9 14
# 
# [[4]]
# [1]  5 10 15
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Soul Donut
  • 357
  • 3
  • 12
  • Very similar to this old question of mine - [identify groups of linked episodes which chain together](http://stackoverflow.com/questions/12135971/identify-groups-of-linked-episodes-which-chain-together) – thelatemail Feb 02 '16 at 01:42

1 Answers1

1

Here's my proposed solution, which I find to be overwrought considering how the problem is relatively straightforward:

library(magrittr)

find_relationships <- function(known_nodes, d){
  # takes a vector of ids, known_nodes, and data consist of ids, d
  subset(d, A %in% known_nodes | B %in% known_nodes) %>%
    unlist %>%
    c(known_nodes) %>%
    unique -> new_nodes

  if(length(new_nodes) == length(known_nodes)){
    return(new_nodes)
  }
  else{
    Recall(new_nodes, d)
  }
}

unique_ids <- unique(c(df$A, df$B))

results <- lapply(unique_ids, find_relationships, d = df) %>% unique
Soul Donut
  • 357
  • 3
  • 12