Supposedly correct SPARQL query (Wikidata) not yielding any results in Python

Question

Edit: When I use...

print("Result", result)

...I get this output:

Result <sparql._ResultsParser object at 0x7f05adbc9668>

...but I do not know if that just means the format is wrong somehow.

Edit 2: Following another request on wikidata, and thanks to the comments in this thread, I concluded that querying Wikidata for every single relation is not feasible. So I ended up downloading a list of all properties with their English labels, descriptions and altLabels and perform the search 'offline'. If need be, an inverted index would further increase the performance. The number of properties in Wikidata is relatively small. Here is the query which you can run in the official SPARQL API to see what the result looks like:

SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
    ?property a wikibase:Property .
    OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
 }
GROUP BY ?property ?propertyLabel ?propertyDescription

Here is what it looks like inside my Python program, including parsing to fit my needs. I know that most of the prefixes in the query are unnecessary, but they do not hurt either:

from SPARQLWrapper import SPARQLWrapper, JSON
from datetime import datetime

File_object = open(r"/home/YOUR_NAME/PycharmProjects/Proj/data_files/wikidata_relation_labels.txt", "r+")

# https://stackoverflow.com/questions/30755625/urlerror-with-sparqlwrapper-at-sparql-query-convert
sparql = SPARQLWrapper("https://query.wikidata.org/sparql", agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 "
                                                                  "(KHTML, like Gecko) Chrome/23.0.1271.64 "
                                                                  "Safari/537.11")
sparql.setQuery("""PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
    ?property a wikibase:Property .
    OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
 }
GROUP BY ?property ?propertyLabel ?propertyDescription
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
dateTimeObj = datetime.now()
print("timestamp: ", print(dateTimeObj))
for result in results["results"]["bindings"]:
    p_id = p_label = p_description = p_alt_labels = ""
    if result["property"]["value"]:
        p_id = result["property"]["value"].rsplit('/', 1)[1]
    if result["propertyLabel"]["value"]:
        p_label = result['propertyLabel']['value']
    # why all these "if"s? Because some properties have no description.
    if "propertyDescription" in result:
        if result["propertyDescription"]["value"]:
            p_description = result['propertyDescription']['value']
    if result["altLabel_list"]["value"]:
        p_alt_labels = result["altLabel_list"]["value"]
    File_object.write(p_id + " | " + p_label + " | " + p_description + " | " + p_alt_labels + "\n")

# simple way to check if Wikidata decided to include a pipe somewhere
for line in File_object:
    if line.count('|') > 4:
        print("Too many pipes: ", line)

lines = File_object.readlines()
lines.sort()

# TODO: sort through terminal: 'sort wikidata_relation_labels.txt -o wikidata_relation_labels.txt'

File_object.close()

I'm using pipes as seperators. This could under many circumstances be considered bad practice.

I'm trying to obtain the id and label of all Wikidata properties where either the property's label or one of its "also known as" (alternative) labels equals/contains a given string (relation.label).

I'm using this SPARQL client/API (with a somewhat contradicting description) in Python 3.x.

Here's my code snippet:

import sparql

endpoint = 'https://query.wikidata.org/sparql'

def is_simple_relation(relation):
    s = sparql.Service(endpoint, "utf-8", "GET")
    q = """SELECT DISTINCT ?property ?propertyLabel WHERE {
         ?property rdf:type wikibase:Property;
         rdfs:label ?propertyLabel;
         skos:altLabel ?altLabel.
         FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
         FILTER(CONTAINS(?propertyLabel, "replace_me") || CONTAINS(?altLabel, "replace_me")).
         }
         LIMIT 100"""
    q = q.replace('replace_me', relation.label)
    print("Query: ", q)
    print("Querying")
    result = sparql.query(endpoint, q)
    print("Finished query")
    for row in result.fetchone():
        print("row: ", row)

My output is:

Query:  SELECT DISTINCT ?property ?propertyLabel WHERE {
         ?property rdf:type wikibase:Property;
         rdfs:label ?propertyLabel;
         skos:altLabel ?altLabel.
         FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
         FILTER(CONTAINS(?propertyLabel, "has effect") || CONTAINS(?altLabel, "has effect")).
         }
         LIMIT 100
Querying
Finished query

That means, I'm not retrieving anything. I have tried to perform the query here and it works as expected, so the query is fine. I have tried performing one of the example queries inside my program and it works as expected, printing multiple rows as intended.

The only possible cause I can think of is that the query takes just that much longer when executed from my program that a timeout is reached, whereas the query is evaluated just in time through the second link. But I am not getting a warning or anything. Is my assumption correct? And if so, can my query be improved? There may be a performance killer that I am not aware of.

Thanks!

*"and it works as expected, so the query is fine." - nah, the query is fine, sure. But it does not work in the web interface - at least not always, maybe you got lucky once. I tried it now and got an exception in the web GUI `Server error: Unexpected end of JSON input` - so most likely the resultstream is cut off which also would explain your error message — UninformedUser, Oct 22 '20 at 07:55
Sometimes it works, now I got no error. But one comment,your query is weird. The filter `FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").` is wrong, the `[AUTO_LANGUAGE]` is just something you can use in the label service but not in standard SPARQL lang filter. You should do `FILTER(LANG(?propertyLabel) = "en"). FILTER(LANG(?altLabel) = "en")` for English labels only. Then it works more or last fast and stable — UninformedUser, Oct 22 '20 at 07:58

Supposedly correct SPARQL query (Wikidata) not yielding any results in Python

0 Answers0