My goal is to build a knowledge graph of terms; for each term; I can (somehow easily) extract the immediate connections from that term to all other terms; the following table (could be stored in MySQL) is an example of what I can extract:
In each row from the above table; we see one immediate (UNDIRECTED) connection, and its weight (or strength). Note that all connections are undirected.
So the question is; can we figure out a connection between terms that are indirect? For example; one link between Leonardo Da Vinci
and Michelangelo
is through the term Italy
; which could be represented as:
Leonardo Da Vinci -- 4 (weight) -- Italy -- 6 (weight) -- Michelangelo
Using PHP and mySQL, we can simply do the following;
<? include('db_settings.php'); ?>
<?php
$con = mysqli_connect($myDB_server, $myDB_userName, $myDB_password, $myDB_name);
if (mysqli_connect_errno($con))
echo "Error :( <BR/>";
$connectionFrom = 'Leonardo Da Vinci';
$result = mysqli_query($con, "SELECT * FROM termLinks WHERE termLinks_t1 = '$connectionFrom'");
while( $row = mysqli_fetch_array($result) )
{
$currConnection = $row[2];
$newResult = mysqli_query($con, "SELECT * FROM termLinks WHERE termLinks_t2 = '$currConnection'");
while ( $newRow = mysqli_fetch_array($newResult) )
{
if ( strcmp($newRow[1], $connectionFrom) != 0 )
echo "There is a connection between " . $connectionFrom . " and " . $newRow[1] . " through " . $currConnection;
}
echo "<BR/>";
}
mysqli_close($con);
?>
Which will result in the following:
There is a connection between Leonardo Da Vinci and Michelangelo through Italy
There is a connection between Leonardo Da Vinci and Lorenzo de’ Medici through Renaissance
But in other situations; we may need to go through multiple links to find a connection; for example there exist a connection between Lorenzo de’ Medici
and Michelangelo
through the following:
Lorenzo de’ Medici -- Renaissance -- Leonardo Da Vinci -- Italy -- Michelangelo
What would be the best approach to extract all connections between all terms? I understand that this may be an extremely complicated problem to be solved; but I’m open for any suggestions in which I could perhaps build a data structure that I can use to extract all connections rather efficiently...