5

I struggle to come up with a title that describes what I'm trying to solve, so please comment if you have a better title!

The solution can be in R, Python, or SQL (Aster TeraData SQL to be exact, though a solution any SQL language is very helpful for learning purposes)

The problem: Given a list of pairs of items in no particular order, generate an output that links together all pairs that are related with at least one link.

Here is a simple example using R:

colone = c("a","b","u","e","f","f","j","z")
coltwo = c("b","c","c","a","g","h","h","y")
d <- data.frame(colone, coltwo)
d
  colone coltwo
1      a      b
2      b      c
3      u      c
4      e      a
5      f      g
6      f      h
7      j      h
8      z      y

Desired output (in any easily-readable data structure):

(a,b,c,e,u)
(f,g,h,j)
(y,z)

Essentially, the input is representing a graph of nodes and edges. The desired out is a list of all objects within the graph that are connected.

Any help or thoughts would be appreciated!

so13eit
  • 942
  • 3
  • 11
  • 22
  • The question is a bit unclear. I am not understanding the expected output. – ZdaR Jun 15 '15 at 14:51
  • @ZdaR I've updated the question with more explanation on the problem- I realized it was essentially a graph question. – so13eit Jun 15 '15 at 15:02
  • This is known as [tag:connected-components], which is the transitive closure of your relation. It is the standard example for a problem that needs an iterative approach, and cannot be modeled in pure SQL (and which is, thus, a motivation for PL/SQL and similar extensions). – Has QUIT--Anony-Mousse Jun 15 '15 at 15:20
  • @Anony-Mousse I've updated the question title to reflect my question better. Thanks also for the update on SQL- great to know! – so13eit Jun 15 '15 at 15:24

1 Answers1

3

In R, you could use the package igraph:

library(igraph)
gg <- graph.edgelist(as.matrix(d), directed=F)
split(V(gg)$name, clusters(gg)$membership)
#$`1`
#[1] "a" "b" "c" "u" "e"
#
#$`2`
#[1] "f" "g" "h" "j"
#
#$`3`
#[1] "z" "y"

And you can look at the graph using:

plot(gg)

This is based on an excellent answer by MrFlick here

talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    thanks very much- I realized a bit after I posted that I Was essentially asking a graph membership question! Thanks also for the link to what your answer is based on, I will make sure to go upvote his answer too. – so13eit Jun 15 '15 at 15:04