Multiple Self-Joins to Find Transitive Subsets (cliques)

Question

In simplest terms, I have a table representing a relation. Rows in the table represent pairs in my relation. In other words, the first row indicates that the id of 1 is related to the id of 4 and that the id of 4 is related to the id of 1. Hopefully, it is not hard for you to see that my relation is symmetric, although the table shows this symmetry in a concise form.

+-----+-----+  
| id1 | id2 |  
+-----+-----+  
|   1 |   4 |  
|   3 |   1 |  
|   2 |   1 |  
|   2 |   3 |  
|   2 |   4 |  
|   5 |   1 |  
+-----+-----+

EDIT This table is meant to concisely show the following relation:
{(1,4), (4,1), (3,1), (1,3), (2,1), (1,2), (2,3), (3,2), (2,4), (4,2), (5,1), (1,5)}. This can be visualized by the following undirected graph.

CREATE TABLE Test (
id1 int not null,
id2 int not null);

INSERT INTO Test
VALUES
(1,4),
(3,1),
(2,1),
(2,3),
(2,4),
(5,1);

I would like to identify transitive subsets (cliques) in my table.
EDIT For example, I would like to identify the transitive subset demonstrated by the fact that the id of 3 is related to the id of 1 and the id of 1 is related to the id of 2 implies that the id of 3 is related to the id of 2. (These can be seen as triangles in the undirected graph photo. Although, in the best case scenario, I would like to be able to list other complete subgraphs that are larger than triangles if they are present in the original table/graph.)

I've tried doing the following but the result set is larger than I want it to be. I hope that there is an easier way.

select t1.id1, t1.id2, t2.id1, t2.id2, t3.id1, t3.id2
from test as t1
    join test as t2
        on t1.id1 = t2.id1
        or t1.id2 = t2.id2
        or t1.id1 = t2.id2
        or t1.id2 = t2.id1
    join test as t3
        on t2.id1 = t3.id1
        or t2.id2 = t3.id2
        or t2.id1 = t3.id2
        or t2.id2 = t3.id1
where
    not
    (
        t1.id1 = t2.id1
        and
        t1.id2 = t2.id2
    )
    and not
    (
        t2.id1 = t3.id1
        and
        t2.id2 = t3.id2
    )
    and not
    (
        t1.id1 = t3.id1
        and
        t1.id2 = t3.id2
    )
    and
    (
        (
            t3.id1 = t1.id1
            or
            t3.id1 = t1.id2
            or
            t3.id1 = t2.id1
            or
            t3.id1 = t2.id2
        )
        and
        (
            t3.id2 = t1.id1
            or
            t3.id2 = t1.id2
            or
            t3.id2 = t2.id1
            or
            t3.id2 = t2.id2
        )
    );

Actual Output:

+-----+-----+-----+-----+-----+-----+
| id1 | id2 | id1 | id2 | id1 | id2 |
+-----+-----+-----+-----+-----+-----+
|   1 |   4 |   2 |   4 |   2 |   1 |
|   1 |   4 |   2 |   1 |   2 |   4 |
|   3 |   1 |   2 |   3 |   2 |   1 |
|   3 |   1 |   2 |   1 |   2 |   3 |
|   2 |   1 |   2 |   4 |   1 |   4 |
|   2 |   1 |   2 |   3 |   3 |   1 |
|   2 |   1 |   3 |   1 |   2 |   3 |
|   2 |   1 |   1 |   4 |   2 |   4 |
|   2 |   3 |   2 |   1 |   3 |   1 |
|   2 |   3 |   3 |   1 |   2 |   1 |
|   2 |   4 |   2 |   1 |   1 |   4 |
|   2 |   4 |   1 |   4 |   2 |   1 |
+-----+-----+-----+-----+-----+-----+

The expected result set would only have two rows. Each row would represent a transitive relation that is a subset of the original relation.

╔═════╦═════╦═════╦═════╦═════╦═════╗
║ id1 ║ id2 ║ id1 ║ id2 ║ id1 ║ id2 ║
╠═════╬═════╬═════╬═════╬═════╬═════╣
║   1 ║   4 ║   2 ║   4 ║   2 ║   1 ║
║   3 ║   1 ║   2 ║   1 ║   2 ║   3 ║
╚═════╩═════╩═════╩═════╩═════╩═════╝

EDIT The expected output could also look like,

╔═════╦═════╦═════╗
║ id1 ║ id2 ║ id3 ║
╠═════╬═════╬═════╣
║   1 ║   4 ║   2 ║
║   3 ║   1 ║   2 ║
╚═════╩═════╩═════╝,

whatever is easier. I just need to display the fact that the sets
{(1,4), (4,1), (2,4), (4,2), (2,1), (1,2)}
and
{(3,1), (1,3), (2,1), (1,2), (2,3), (3,2)}
are proper subsets of the original relation and are themselves transitive relations. I am using the definition that a relation R is transitive if and only if ∀a∀b∀c((a,b)∈R ∧ (b,c)∈R → (a,c)∈R). In other words, I am trying to find all subgraphs that are also complete graphs.

I'm new to graphy theory, but it seems like my problem is similar to the clique problem where I am looking for cliques containing 3 or more vertices. I would accept as an answer solutions that return only cliques with 3 vertices. My question is similar to this one. However, the solutions presented there don't seem to use the definition of a clique that I want where every vertex is connected to every other vertex inside of the clique.

Here is an algorithm I found using Java. Hopefully, this will help with an implementation using SQL.

This table can't represent a relationship where a node is related only to itself. Is that OK? — philipxy, Jul 10 '19 at 23:21
Yes, I do not need nodes that are only related to themselves. — Escherichia, Jul 10 '19 at 23:25
You don't clearly describe how the output is a function of the input. PS You don't clearly distinguish between the table's relation(ship)/predicate, the one it represents & associated directed & undirected graphs. Exact details matter. Such lack of detail is likely an impediment to reasoning, coding, searching & asking. Also suggest that you find "transitive subsets" before you pivot edges into your final format. PS Please clarify via edits, not comments. — philipxy, Jul 10 '19 at 23:56
Learn about CTEs including re transitive closures involving arbitrary path lengths. [How to create a MySQL hierarchical recursive query](https://stackoverflow.com/questions/20215744/how-to-create-a-mysql-hierarchical-recursive-query) — philipxy, Jul 10 '19 at 23:58
Explain what a "transitive subset" is. What are the sub-relations you want the transitive closures of? Are you trying to say something about cycles? What specification were you given? Then please make the effort to use words to explain how the example outputs you give are portraying them. This is part of writing correct code. If you don't bother, why should we? Please apply my comments re details to your coding & asking. PS "where loops are not shown" is not clear. (The lack of clarity maybe involves confusing the given table vs a represented relation and/or a graph associated with that.) — philipxy, Jul 11 '19 at 01:47
Masterul use of ASCII [walls](https://www.bing.com/images/search?view=detailV2&ccid=d4TtOHa9&id=6DEA4C38F5ED33FB4FF86000F721F36B85B929D6&thid=OIP.NAdh4RPyyiw967ceaaWNqgHaFJ&mediaurl=http%3a%2f%2frecollectionsofplay.files.wordpress.com%2f2011%2f12%2frogue-2.jpg&exph=400&expw=575&q=rogue+computer+game&simid=608051687414499265&selectedIndex=0&ajaxhist=0). — Kit, Jul 11 '19 at 03:43
Now the problem is more clear, thanks. I would say SQLCR is the way to go, even though it is possible to solve the problem using recursive CTE I think...let me try — mauridb, Jul 11 '19 at 20:09
We already know what transitive & transitive closure are. You needed to tell us what relations you wanted the transitive closure of. You still haven't. Your "in other words" does not introduce a rephrasing. It does precede a clearer statement of what you want than "transitive subset". Notice also that you are phrasing things in terms of subgraphs but you are still not saying what graph you want the subgraphs of. (You're still not clear re relations & graphs.) You haven't addressed the rest of my comments. Read your post & see what you are actually saying & clearly say what you want. — philipxy, Jul 11 '19 at 21:16
My question is similar to: [link](https://stackoverflow.com/questions/30389474/how-many-complete-graph-present-in-given-undirected-graph) — Escherichia, Jul 11 '19 at 23:09
This seems to the goal "I am trying to find all subgraphs that are also complete graphs.". This means that transitive closure is actually not needed at all. — mauridb, Jul 12 '19 at 00:17
Yep, did some test, using the Complete Graph reference on wikipedia and recursive cte is not needed, it seems more a relational division problem. Quite interesting one, I'm testing a solution :) — mauridb, Jul 12 '19 at 01:07

score 2 · Answer 1 · answered Jul 10 '19 at 23:40

Times ago I needed to create clusters of data using transitive closure. The best way to do that was using SQLCLR. Here's the GitHub code (article to detailed link is there too)

https://github.com/yorek/non-scalar-uda-transitive-closure

That could be a good starting point. Can you also be more precise on what is the expected result with the input data you have in the sample?

score 0 · Answer 2 · answered Jul 13 '19 at 04:42

Here's the solution. It is based on the idea that a complete graph contains all the possible combination of its subgraphs. Code is here, I'll comment it in detail over the weekend, but it case I couldn't at least you have the code right on monday. Please be aware that this is quite a brute force approach and if you need graph bigger than 30 nodes won't work. Still I think it's a nice sample of "lateral thinking". Enjoy:

/*
    Create table holding graph data.
    Id1 and Id2 represent the vertex of the graph.
    (Id1, Id2) represent and edge.

    https://stackoverflow.com/questions/56979737/multiple-self-joins-to-find-transitive-subsets-cliques/56979901#56979901
*/
DROP TABLE IF EXISTS #Graph;
CREATE TABLE #Graph (Id1 INT, Id2 INT);
INSERT INTO 
    #Graph
VALUES
    (1,2)
    ,(1,3)
    ,(1,4)
    ,(2,3)
    ,(2,4)
    ,(5,1)
    --,(4,3) -- Uncomment this to create a complete subgraph of 4 vertex
;
GO

/*
    Create Numbers Table
*/
DROP TABLE IF EXISTS #Numbers;
SELECT TOP (100000)
    ROW_NUMBER() OVER(ORDER BY A.[object_id]) AS Num
INTO 
    #Numbers
FROM 
    sys.[all_columns] a CROSS JOIN sys.[all_columns] b
ORDER BY 
    Num
GO

/*
    Make sure Id1 is always lower then Id2.
    This can be done as the graph is undirected
*/

DROP TABLE IF EXISTS #Graph2;
SELECT DISTINCT
    CASE WHEN Id1<Id2 THEN Id1 ELSE Id2 END AS Id1,  
    CASE WHEN Id1<Id2 THEN Id2 ELSE Id1 END AS Id2  
INTO
    #Graph2
FROM 
    #Graph;
GO

/*
    Turn edges into single columns
*/
DROP TABLE IF EXISTS #Graph3;
SELECT 
    CAST(Id1 AS VARCHAR(MAX)) + '>'  + CAST(Id2 AS VARCHAR(MAX)) COLLATE Latin1_General_BIN2 AS [Edge] 
INTO 
    #Graph3 
FROM 
    #Graph2;

/*
    Get the list of all the unique vertexes
*/
DROP TABLE IF EXISTS #Vertex;
WITH cte AS
(
    SELECT Id1 AS Id FROM #Graph
    UNION 
    SELECT Id2 AS Id FROM #Graph
)
SELECT * INTO #Vertex FROM cte;

/*
    Given a complete graph with "n" vertexes, 
    calculate all the possibile complete cyclic subgraphs
*/
-- From https://stackoverflow.com/questions/3686062/generate-all-combinations-in-sql
-- And changed to return all combinations complete cyclic subgraphs 
DROP TABLE IF EXISTS #AllCyclicVertex;
DECLARE @VertexCount INT = (SELECT COUNT(*) FROM [#Vertex]);
WITH Nums AS 
(
    SELECT 
        Num
    FROM 
        #Numbers
    WHERE 
        Num BETWEEN 0 AND POWER(2, @VertexCount) - 1
), BaseSet AS 
(
    SELECT 
        I = POWER(2, ROW_NUMBER() OVER (ORDER BY [Id]) - 1), *
   FROM 
        [#Vertex]
), Combos AS 
(
    SELECT
        CombId = N.Num,
        S.Id,
        K = COUNT(*) OVER (PARTITION BY N.Num)
   FROM
        Nums AS N
    INNER JOIN 
        BaseSet AS S ON N.Num & S.I <> 0
)
SELECT
    DENSE_RANK() OVER (ORDER BY K, [CombID]) AS CombNum,
    K,
    Id
INTO
    #AllCyclicVertex
FROM 
    Combos
WHERE 
    K BETWEEN 3 AND @VertexCount
ORDER BY 
    CombNum, Id;
GO

--SELECT * FROM [#AllCyclicVertex]

/*
    Calculate the edges for the calculated cyclic graphs
*/
DROP TABLE IF EXISTS #WellKnownPatterns;
CREATE TABLE #WellKnownPatterns ([Name] VARCHAR(100), [Id1] INT, [Id2] INT, [Edge] VARCHAR(100) COLLATE Latin1_General_BIN2);

INSERT INTO #WellKnownPatterns 
    ([Name], [Id1], [Id2], [Edge])
SELECT 
    CAST(a.[CombNum] AS VARCHAR(100)) + '/' + CAST(a.[K] AS VARCHAR(100)),
    a.Id AS Id1, 
    b.Id AS Id2,
    CAST(a.[Id] AS VARCHAR(MAX)) + '>'  + CAST(b.[Id] AS VARCHAR(MAX)) AS [Edge]
FROM 
    #AllCyclicVertex a 
INNER JOIN 
    #AllCyclicVertex b ON b.id > a.id AND a.[CombNum] = b.[CombNum]
;

-- SELECT * FROM [#WellKnownPatterns]

/*
    Now take from the original set only those 
    who are EXACT RELATIONAL DIVISION of a well-known cyclic graph
*/

WITH cte AS
(
    SELECT * FROM #Graph3
),
cte2 AS 
(
    SELECT 
        COUNT(*) OVER (PARTITION BY [Name]) AS [EdgeCount],
        * 
    FROM 
        #WellKnownPatterns
)
SELECT
    T1.[Name]
FROM
    cte2 AS T1
LEFT OUTER JOIN
    cte AS S ON T1.[Edge] = S.[Edge]
GROUP BY
    T1.[Name]
HAVING 
    COUNT(S.[Edge]) = MIN(T1.[EdgeCount])
GO

-- Test a solution
SELECT * FROM [#WellKnownPatterns] WHERE [Name] = '1/3'

I was hoping for an easier way, but I guess I can't have my cake and eat it too. — Escherichia, Jul 15 '19 at 18:10

Multiple Self-Joins to Find Transitive Subsets (cliques)

2 Answers2