How to check the distance to the top category in dbpedia in sparql

Question

I am using the following query to get dbpedia categories (i.e. skos:broader|dct:subject) of a given dbpedia URI.

all_urls = ['http://dbpedia.org/resource/Machine_learning', 'http://dbpedia.org/resource/Category:Machine_learning']

for url in all_urls:
    print("------")
    print(url)
    print("------")
    sparql.setQuery("""
        SELECT * WHERE {<"""
             +url+
            """>skos:broader|dct:subject ?resource 
            }
    """)

    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    for result in results["results"]["bindings"]:
        print('resource ---- ', result['resource']['value'])

The output is:

------
http://dbpedia.org/resource/Machine_learning
------
resource ----  http://dbpedia.org/resource/Category:Cybernetics
resource ----  http://dbpedia.org/resource/Category:Learning
resource ----  http://dbpedia.org/resource/Category:Machine_learning
------
http://dbpedia.org/resource/Category:Machine_learning
------
resource ----  http://dbpedia.org/resource/Category:Artificial_intelligence
resource ----  http://dbpedia.org/resource/Category:Learning

Now I want to check the distance of each category in the output to the top level of dbpedia's category system (according to my current understanding dbc:Contents is the top element in the skos:broader and dct:subject hierarchy). Is it possible to do it in sparql?

I am happy to provide more details if needed.

Possible duplicate of [Calculate length of path between nodes?](https://stackoverflow.com/questions/5198889/calculate-length-of-path-between-nodes) — UninformedUser, Jul 03 '19 at 02:53
Was asked and answered before: https://stackoverflow.com/questions/5198889/calculate-length-of-path-between-nodes — UninformedUser, Jul 03 '19 at 02:53
And I can tell you, this will lead to a timeout and not scale. Moreover, the solution has limitations once cycles occur - which as far as I remember is possible in the DBpeda/Wikipedia category system. Last but not least, that's just something SPARQL isn't designed for, graph traversal languages would be the better way here. — UninformedUser, Jul 03 '19 at 02:58
@AKSW Thanks a lot for the comments. I will check the answer that you have linked. Could you please let me know if the below mentioned sentence is correct? `dbc:Contents is the top element in the skos:broader and dct:subject hierarchy`. Thank you :) — EmJ, Jul 03 '19 at 04:15
Technically, yes. You can see this here https://en.wikipedia.org/wiki/Category:Contents and via `select * { dbc:Contents skos:broader ?o }` being empty. But clearly, a more meaningful top level would be something below, like https://en.wikipedia.org/wiki/Category:Articles or maybe even its subcategory https://en.wikipedia.org/wiki/Category:Main_topic_classifications - it depends on the application. — UninformedUser, Jul 03 '19 at 05:41
That said, you never said what you're doing in general. What is purpose of your project? I mean, you're already working on it for several weeks now, not sure if this is worth and not to overcomplicated or the wrong direction — UninformedUser, Jul 03 '19 at 05:42
@AKSW Thanks a lot for your valuable comments. I am actually working on a Computer Science related textual dataset. My task is to identify valid computer science related terms from the dataset and do the remaining processing. I have finished the processing part. However, since my `identification of valid computer science related terms` is very noisy, my processed results are not good. :( My team is happy with my selection of using wikipedia/dbpedia/wikidata to detect valid computer science related terms. However, I still could not find a way to accomplish my task. — EmJ, Jul 03 '19 at 06:07
@AKSW What I mean by `Computer Science related terms` is algorithms (e.g., `word2vec`), and computer science branches (e.g., `machine learning`). Examples of noisy terms are `intuition`, `one time`, `long short`, `friendship` etc. Please kindly let me know if you have any suggestions :) — EmJ, Jul 03 '19 at 06:09
I cannot give you any suggestions as I don't know how you do the extraction from text nor do I know how you do the mapping from the words in the text to the DBpedia/Wikidata resources. I mean, you're the only person who knows why the noisy term occur. — UninformedUser, Jul 03 '19 at 06:51
@AKSW I know you have a lot of expertise in this area and have really great ideas. It would be greatly appreciated if you could show me a path that I can follow to accomplish my task. Please kindly let me know if my description in the previous comment is not clear. Thank you very much :) — EmJ, Jul 03 '19 at 07:12

How to check the distance to the top category in dbpedia in sparql

0 Answers0