What does the "identifier" in "Graph" do?

Question

I try to query a database like this:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF, SKOS
from rdflib.plugins.stores import sparqlstore


# define endpoint according to https://www.stardog.com/docs/
endpoint = 'http://path/to/query'  # http://<server>:<port>/{db}/query

# create store
store = sparqlstore.SPARQLUpdateStore()

# I only want to query
store.open(endpoint)
store.setCredentials('me', 'my_pw')

# What does this actually do? That runs through
default_graph = URIRef('some:stuff')
ng = Graph(store, identifier=default_graph)
# # If identifier is not defined, it crashes
# ng = Graph(store)

rq = """
SELECT ?foo ?bar 
WHERE {
  ?something a <http://path/to/data/.ttl#SomeValues>.
  ?something <http://path/to/data/.ttl#foo> ?foo.
  ?something <http://path/to/data/.ttl#bar> ?bar.                       
}
"""

query_res = ng.query(rq)
for s, l in query_res:
    print(s, l)

Unfortunately, I don't get any results at the moment:

<head><variable name="foo"></variable><variable name="bar"></variable></head><results></results></sparql>

My question is, what the identifier in Graph is doing i.e. whether this is important and if so, how it should be defined. When I do not define it, the code crashes with:

Response: b'{"message":"No separator character found in the URI: N53e412e0f3a74d6eab7ed6da163463bf"}'

If I put in anything else that has a colon, or slash in it, it runs through (but the query still does not return anything).

Could anyone briefly explain, what one should put in there and whether this might be the cause for the unsuccessful query (the query command itself is correct; when I call it from another tool, it works fine)?

I guess, you could try `"tag:stardog:api:context:default"` for Stardog. — Stanislav Kralin, Mar 06 '18 at 18:33
@StanislavKralin: You mean `"tag:stardog:api:context:default"` instead of `"some:stuff"`? Thanks for suggesting a debugging strategy; for me that topic is quite new, so I do not really know how to approach that... What does this query do? Just getting ten entries from the database? Is there a way to check what is in the `store` and/or `ng` i.e. is there a way to check whether the connection to the database is functional and browse its contents? — Cleb, Mar 06 '18 at 18:39
Yes, instead of "some:stuff". Most likely, identifier is a value of [`default-graph-uri`](https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation). You should know the name of a graph in which your stuff is stored... However, in many triplestores there exists "special" named graphs, i. e. `tag:stardog:api:context:default` in Stardog. — Stanislav Kralin, Mar 06 '18 at 19:26
@StanislavKralin: Thanks again, will try! Is there a way to check what is in the `store` and/or `ng` i.e. is there a way to check whether the connection to the database is functional and browse its contents? I would like to narrow it down to ì) connection failed or ii) the query failed. Any ideas for that? — Cleb, Mar 06 '18 at 19:38
In Stardog, `tag:stardog:api:context:default` is a name of "unnamed" graph. Try to get content of this unnamed graph using simple queries, for example `SELECT * { ?s ?p ?o } LIMIT 10`. Are results looking like your stuff? — Stanislav Kralin, Mar 06 '18 at 19:38
I think, connection is functional, results look like [SPARQL XML results](https://www.w3.org/TR/rdf-sparql-XMLres/#head). — Stanislav Kralin, Mar 06 '18 at 19:41
@StanislavKralin: Using `"tag:stardog:api:context:default"` does the trick. So please go ahead and add it as an answer which I will upvote and accept (please also add your debugging strategie which might also help others, the `SELECT * { ?s ?p ?o } LIMIT 10`). Not part of the original question, but what is now the easiest way to clean the results for `foo` and `bar` (the values appear at the end of a link; would I have to clean this link manually or is there something built-in?), so e.g. `s` looks like `http://path/to/#foo_i` and I only want to have `foo_i` for all `i`. Thanks! — Cleb, Mar 07 '18 at 07:43
It is not a good practice to parse URIs, perhaps these entities have `rdfs:labels`. Anyway, see https://stackoverflow.com/a/26405954/7879193. Possibly Stardog has builtins for this like Jena's `afn:localname`. — Stanislav Kralin, Mar 07 '18 at 07:57
@StanislavKralin: Ok, will need to dig into this; first attempts all lead to errors... Anyway, this would also be a question on its own, so then just add `"tag:stardog:api:context:default"` as an answer and I open a new question if necessary. — Cleb, Mar 07 '18 at 08:11

Stanislav Kralin · Accepted Answer · 2018-03-16T06:30:07.157

The identifier argument of the Graph constructor allows to identify an RDFLib graph. If the value is None, then blank node is used as an identifier.

However, if the store value is a SPARQLUpdateStore, then the identifier value is also used as default-graph-uri of the SPARQL Protocol, and hence can not be a blank node.

Thus, the problem is: what is the name of the default "unnamed" graph in a remote triplestore?

From Stardog's documentation:

Naming

Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are provided for convenience and can be used wherever named graph IRIs are expected. This includes SPARQL queries & updates, property graph operations and configuration values. Following is a list of special named graph IRIs.
          Named Graph IRI                             Refers to                
--------------------------------  ---------------------------------------------
tag:stardog:api:context:default   the default (no) context graph              
tag:stardog:api:context:all       all contexts, including the default graph    
tag:stardog:api:context:named     all named graphs, excluding the default graph

I can't find any public of private Stardog endpoint (it seems that ABS's endpoint is down). Example on DBpedia:

from rdflib import Graph, URIRef
from rdflib.plugins.stores import sparqlstore

store = sparqlstore.SPARQLUpdateStore()
store.open('http://dbpedia.org/sparql')

default_graph = URIRef('http://people.aifb.kit.edu/ath/#DBpedia_PageRank') 
ng = Graph(store, identifier=default_graph)

rq = """
    SELECT ?foo ?foobar {
      ?foo ?foobar ?bar                       
    } LIMIT 100
"""

query_res = ng.query(rq)
for s, l in query_res:
    print(s, l)

The results are similar to what they should be. Even in your code, the name of unnamed graph is the only problem, the results obtained are correct SPARQL XML results.

P.S. Possibly you could try sparqlwrapper instead of rdflib for your purpose.

`tag:stardog:api:context:default` indeed does the trick. I am not bound to `rdflib` and am more than happy to use the `sparqlwrapper`. Would it be much work for you to adapt the above's code to make use of it or could you point me to a code sample where also the credentials are set? — Cleb, Mar 07 '18 at 15:19
Adapting [this code](https://gist.github.com/wikier/38c0e3a4925b0bb81dd0) seems to work fine, too. Then it seems to work even without the default graph definition, it seems. — Cleb, Mar 07 '18 at 15:51
Yes, this is shorter using SPARQLWrapper. RDFLib's `SPARQLUpdateStore` wraps SPARQLWrapper, AFAIK. — Stanislav Kralin, Mar 07 '18 at 17:07
So, was that the `sparqlwrapper` you referred to in your answer or is that a different one? Is it better to use it directly or are there advantages using the `SPARQLUpdateStore`? — Cleb, Mar 07 '18 at 18:19
If you don't want to store triples, use SPARQLWrapper directly. — Stanislav Kralin, Mar 07 '18 at 18:22

What does the "identifier" in "Graph" do?

1 Answers1