1

My goal is to build a knowledge graph of terms; for each term; I can (somehow easily) extract the immediate connections from that term to all other terms; the following table (could be stored in MySQL) is an example of what I can extract:

enter image description here

In each row from the above table; we see one immediate (UNDIRECTED) connection, and its weight (or strength). Note that all connections are undirected.

So the question is; can we figure out a connection between terms that are indirect? For example; one link between Leonardo Da Vinci and Michelangelo is through the term Italy; which could be represented as:

Leonardo Da Vinci -- 4 (weight) -- Italy -- 6 (weight) -- Michelangelo

Using PHP and mySQL, we can simply do the following;

<? include('db_settings.php'); ?>

<?php

    $con = mysqli_connect($myDB_server, $myDB_userName, $myDB_password, $myDB_name);

    if (mysqli_connect_errno($con))
        echo "Error :( <BR/>";

    $connectionFrom = 'Leonardo Da Vinci';

    $result = mysqli_query($con, "SELECT * FROM termLinks WHERE termLinks_t1 = '$connectionFrom'");

    while( $row = mysqli_fetch_array($result) )
    {
        $currConnection = $row[2];
        $newResult = mysqli_query($con, "SELECT * FROM termLinks WHERE termLinks_t2 = '$currConnection'");

        while ( $newRow = mysqli_fetch_array($newResult) )
        {
            if ( strcmp($newRow[1], $connectionFrom) != 0 )
               echo "There is a connection between " . $connectionFrom . " and " . $newRow[1] . " through " . $currConnection;
        }   

        echo "<BR/>";
    }

    mysqli_close($con);
?>

Which will result in the following:

There is a connection between Leonardo Da Vinci and Michelangelo through Italy There is a connection between Leonardo Da Vinci and Lorenzo de’ Medici through Renaissance

But in other situations; we may need to go through multiple links to find a connection; for example there exist a connection between Lorenzo de’ Medici and Michelangelo through the following:

Lorenzo de’ Medici -- Renaissance -- Leonardo Da Vinci -- Italy -- Michelangelo

What would be the best approach to extract all connections between all terms? I understand that this may be an extremely complicated problem to be solved; but I’m open for any suggestions in which I could perhaps build a data structure that I can use to extract all connections rather efficiently...

Roronoa Zoro
  • 1,013
  • 2
  • 15
  • 25
  • 1
    check out the [transitive closure](http://en.wikipedia.org/wiki/Transitive_closure) and [these answers](http://stackoverflow.com/questions/3517524/best-known-transitive-closure-algorithm-for-graph) – Walter Tross Jun 02 '13 at 17:25

1 Answers1

1

use mysql's GROUP_CONCAT, it will group together all the termLinks_t1 that have termLinks_t2 in common

SELECT 
    a.*,
    (SELECT 
        GROUP_CONCAT(b.termLinks_t1)
         FROM 
            termLinks b 
         WHERE 
            a.termLinks_t2 = b.termLinks_t2 AND
            a.termLinks_t1 != b.termLinks_t1
         GROUP BY
            b.termLinks_t1
         ) as connections 
FROM 
    termLinks

so it would return something like (assumes Lorenzo de’ Medici also has Italy as a link, otherwise connections would just be Michelangelo)

termLinks_t1         termLinks_t2    connections
Leonardo Da Vinci    Italy           Michelangelo, Lorenzo de’ Medici

As for the second case (the deep link) not sure if i find something ill reedit.

Patrick Evans
  • 41,991
  • 6
  • 74
  • 87