SPARQL query for collapsing branches of a tree (summarizing topology)

Question

Let's say we have the same tree of this question

A, D, E and H are 'special' nodes belonging to the class :Special. Here is the tree definition

@prefix : <http://example.org#> .

:orgA :hasSuborganization :orgB, :orgC, :orgD.
:orgB :hasSuborganization :orgE, :orgF.
:orgE :hasSuborganization :orgG.
:orgG :hasSuborganization :orgH.
:orgA a :Special .
:orgD a :Special .
:orgE a :Special .
:orgH a :Special .

I'd like to get the same tree of the starting one but with only special nodes. A kind of summary of the starting topology. E.g., Expected output:

  A
 _|_
|   |
E   D
|
H

I'd like to get that with a SPARQL query. My starting point:

@prefix : <http://example.org#> .

select ?node ?node2 (count(?mid) as ?distance) where {
    ?node :hasSuborganization* ?mid .
    ?mid :hasSuborganization+ ?node2 .
    ?node2 a :Special .
    {
        select * where { 
            <http://example.org#orgA> :hasSuborganization* ?node .
            ?node a :Special .
        }
    }
} group by ?node ?node2

In this way I get the distance of each couple of special nodes in the tree.

How can I filter just the super-sub relations (i.e., A-D,A-E,E-H)? I believed that filtering the rows with the minimum values in my result-set was enough. Actually, it fails if one :Special node has :Special descendants at different height (e.g., distance(A-D) = 1, distance(A-E) = 2).

Probably I need something different.

SPARQL 1.1 supports negation by means of `FILTER NOT EXISTS` - you could try this to get at least root and leaf nodes of a tree. — UninformedUser, Jul 09 '18 at 18:52
I’m not sure of having understood your hint. The innermost query of my question already returns root and leaves. The problem I’m facing is building correct relations among them :) — floatingpurr, Jul 09 '18 at 23:59
Are we talking about the same query? Why should your inner-most query return leaf nodes? A leaf node doesn't have children, or not? Your query just returns nodes that are suborganizations (transitively) of `A` and , in addition, are of type `Special`. But correct me if I'm wrong. — UninformedUser, Jul 10 '18 at 05:55
You are right. I guess there was a semantic misunderstanding on the term leaf (I used it in the wrong way. I'm going to clarify it in the post). My innermost query returns just the collection of the nodes that I want to see in my summarized tree. Nevertheless, I cannot figure out how to get the complete topology of the summarized tree. — floatingpurr, Jul 10 '18 at 10:10

floatingpurr · Accepted Answer · 2018-07-10T13:43:57.770

Reasoning on AKSW clues in comments, a possible solution could be this one:

@prefix : <http://example.org#> .

select * where {
    ?node :hasSuborganization+ ?end .
    ?end  a :Special .
    FILTER NOT EXISTS {
        ?node :hasSuborganization+ ?mid .
        ?mid :hasSuborganization+ ?end .
        ?mid a :Special .
    }
    {
        select * where { 
            :orgA :hasSuborganization* ?node .
            ?node a :Special .
        }
    }
}

Explanation:

The innermost query returns all :Special nodes from a root node (i.e., :orgA).

select * where { 
    :orgA :hasSuborganization* ?node .
    ?node a :Special .
}

Then, the outer query selects all possible ?node :hasSuborganization+ ?end patterns. For example, for ?node = :orgA we get: A-D,A-E,E-H.

Finally, the outer query filters out patterns with a :Special intermediate node (i.e., ?mid)

FILTER NOT EXISTS {
    ?node :hasSuborganization+ ?mid .
    ?mid :hasSuborganization+ ?end .
    ?mid a :Special .
}

The final resultset is a collection of <?node, ?end> couples for building this summarized tree:

  A
 _|_
|   |
E   D
|
H

This query works fine, even it does not scale very well when the tree becomes huge. Optimizations or different strategies could be possible.

SPARQL query for collapsing branches of a tree (summarizing topology)

1 Answers1