0

As I have a DBpedia query and I want to rank those results by using the PageRank algorithm.

Toward the concept "Machine_learning", by using the SPARQL query below, I can find out all the ParentNodes, ChildNodes and SiblingNodes in DBpedia.

select * where {
{ ?childNodes skos:broader <http://dbpedia.org/resource/Category:Machine_learning> . ?childNodes skos:broader ?siblingConceptsFormChildNodes}
UNION
{<http://dbpedia.org/resource/Category:Machine_learning> skos:broader ?parentNodes . ?siblingConceptsFormParentNodes skos:broader ?parentNodes}
}

For the visualization, the topic hierarchy would be like this: Regulated concept map

As you may found that the topic hierarchy is based on the SKOS:broader and SKOS:narrower properties.

My intention is to rank all the nodes exist in the topic hierarchy by PageRank. The results from the query above are limited.

And I also found out this question that seems related to my question: How to use DBpedia properties to build a topic hierarchy?

However, I think the approach between us is a bit different.

I also adjust the PageRank algorithm for the topic hierarchy above:

PageRank algorithm

Thank you in advance!

BBQ
  • 23
  • 4
  • do you have a specific question now? I mean, the SPARQL query you got from an earlier question does return what you want. So what are you asking for here? – UninformedUser Sep 09 '20 at 12:16
  • Thank you for your comment. I am going to calculate the importance of those nodes so that I ask for is it possible for me to achieve it by sparqlwrapper using python? – BBQ Sep 09 '20 at 14:17
  • possible? I mean, why not - you just have to run thousands of SPARQL queries that compute the counts, right? Which can take a long time for sure. You should also load the DBpedia dataset into your local triple store. I doubt that using a shared DBpedia endpoint is the best service for running heavy loads – UninformedUser Sep 09 '20 at 14:36
  • you should also check the literature, I'm aware of at least a few approaches that computed the pagerank (among other scores) for DBpedia. Also, some triple stores do have those page rank computation as an additional feature. Eg.g GraphDB(RDFRank) and Virtuoso do have such functionality to compute the score based on links. This is for sure way more efficient than running all the SPARQL queries you'll need via HTTP. And batch queries will most likely timeout I guess – UninformedUser Sep 09 '20 at 14:40
  • I am not trying to run thousands of SPARQL queries. I am only build a topic hierarchy toward specific keyword/concept and calculate all the importance of Nodes included in the topic hierarchy. Thank you so much for your helpful comments! I will try it out using python. – BBQ Sep 10 '20 at 05:26
  • ok, then it might scale. I don't know which kind of things you need, but if I remember correctly, you'll need incoming and outgoing links as well as the count of each. And then you can do the pagerank computation locally. – UninformedUser Sep 10 '20 at 05:49
  • Yes, you told exactly what I am going to do. I will need a keyword as an input, and the regulated concept map will be generated. Then I will need incoming and outgoing links as well as the count of each. Thank you so much for clarifying my problem. – BBQ Sep 10 '20 at 07:54
  • @UninformedUser For my idea, is it possible that I only query the data by using DBpedia endpoint and calculate it as I am not care about the calculation time. At this moment, I am only willing to get the result first. Thank you so much in advance. – BBQ Sep 22 '20 at 04:17
  • how do you calculate the pagerank with SPARQL? I mean, the pagerank algorithms needs iterations, that's impossible with SPARQL. I also don't see the need for it given that you can easily fetch your subset of the data and do the computation in your client code – UninformedUser Sep 22 '20 at 13:56
  • Thank you for your comment! I understood that I have to do the pagerank computation locally. Is it correct that if I would like to calculate the Pagerank value of the topic hierarchy I query below: select * where { { ?childNodes skos:broader . ?childNodes skos:broader ?sameLevelConceptsFormChildNodes} UNION { skos:broader ?parentNodes . ?sameLevelConceptsFormParentNodes skos:broader ?parentNodes} } – BBQ Sep 23 '20 at 06:39
  • Wikipedia Article Categories and categories metadata (2020.07.01) 44.97 MB en, skos. Is this data set suitable for my problem? @UninformedUser https://databus.dbpedia.org/dbpedia/collections/latest-core – BBQ Sep 23 '20 at 06:42
  • yes, I think using the latest categories subset and load the file locally would be the best way. In the end you need all edges between all pairs of categories for the computation of a proper pagerank value. – UninformedUser Sep 23 '20 at 07:01
  • Thank you for the immediately reply. At this moment, I need to find out how to load the data I mentioned above using Python. Meanwhile, I have to find out how to query the topic hierarchy locally for the pagerank calculation. Am I correct? @UninformedUser – BBQ Sep 23 '20 at 07:10
  • loading can be done via `rdflib` Python project. Not sure if you really need to query the data then. I mean, it's not that large I think so why not computing pagerank on the whole dataset? But if you really want to query it locally, you can again use `rdflib`. But I don't know what you want to query for. Convert it to a networkx graph and let networkx do the [pagerank calculation](https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html) – UninformedUser Sep 23 '20 at 07:14
  • Thank you for your suggestion! I will try to load the data set using `rdflib`. Computing pagerank on the whole dataset is also one of the aims that I am going to achieve in order to compare the result between the pagerank value against the Regulated concept map that I mentioned above. Again, thank you so much for your help! @UninformedUser – BBQ Sep 23 '20 at 07:57
  • @UninformedUser After several trials, I still not able to load the data set using `RDFLib`. However, I have a new idea and wondering is it possible to achieve. As I realized that `RDFLib` package can be used to convert the SPARQL query result into a `Graph` instance. After I get the `Graph` instance such as the `Regulated concept map` I mentioned above, is there existed some python packages that can be used to calculate the `PageRank` value against the `Graph` instance? Thank you very much in advance! – BBQ Sep 28 '20 at 07:21
  • what means "not able"? But yes, in you could also convert the `Graph` in rdflib to a networkx graph, see https://rdflib.readthedocs.io/en/stable/_modules/rdflib/extras/external_graph_libs.html - and then compute the pagerank. Clearly, you have to use a SPARQL construct query as it needs RDF triples to create a graph – UninformedUser Sep 28 '20 at 07:49
  • Since my coding skill is really weak, I am not able to transfer the SPARQL query into python code for the `RDFLib` package. Thank you for your answer! As you mentioned `SPARQL construct query`, is it different to SPARQL query? How can I transfer my SPARQL query above into `SPARQL construct query`? Sorry that I am lack of knowlege of this aspect. Thank you so much! – BBQ Sep 28 '20 at 08:03
  • @UninformedUser From this site, [Link](https://www.futurelearn.com/courses/linked-data/0/steps/16104) I understood the different between SPARQL construct query and SPARQL query. However, I am still confused that how to transfer my SPARQL query above in to SPARQL construct query. – BBQ Sep 28 '20 at 08:30
  • `construct {?child skos:broader ?parent } where { { ?child skos:broader . ?child skos:broader ?parent} UNION { skos:broader ?parent . ?child skos:broader ?parent} }` creates edges as in the original DBpedia graph. Indeed the edges to the source category are missing, you have to also add them. – UninformedUser Sep 28 '20 at 08:39

1 Answers1

0

If you have not already solved your problem, you might consider loading the DBpedia data into Anzograph and then using the built-in service. See docs and examples here https://docs.cambridgesemantics.com/anzograph/v2.2/userdoc/pagerank.htm

Disclaimer: I work for Cambridge Semantics Inc.

Sean Martin
  • 171
  • 3
  • Thank you for your suggestion. The method you suggest me seems helpful but not suitable for my case. I think it is useful for me to compare the result between them. – BBQ Sep 28 '20 at 05:04