Edit: When I use...
print("Result", result)
...I get this output:
Result <sparql._ResultsParser object at 0x7f05adbc9668>
...but I do not know if that just means the format is wrong somehow.
Edit 2: Following another request on wikidata, and thanks to the comments in this thread, I concluded that querying Wikidata for every single relation is not feasible. So I ended up downloading a list of all properties with their English labels, descriptions and altLabels and perform the search 'offline'. If need be, an inverted index would further increase the performance. The number of properties in Wikidata is relatively small. Here is the query which you can run in the official SPARQL API to see what the result looks like:
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
?property a wikibase:Property .
OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
}
GROUP BY ?property ?propertyLabel ?propertyDescription
Here is what it looks like inside my Python program, including parsing to fit my needs. I know that most of the prefixes in the query are unnecessary, but they do not hurt either:
from SPARQLWrapper import SPARQLWrapper, JSON
from datetime import datetime
File_object = open(r"/home/YOUR_NAME/PycharmProjects/Proj/data_files/wikidata_relation_labels.txt", "r+")
# https://stackoverflow.com/questions/30755625/urlerror-with-sparqlwrapper-at-sparql-query-convert
sparql = SPARQLWrapper("https://query.wikidata.org/sparql", agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 "
"(KHTML, like Gecko) Chrome/23.0.1271.64 "
"Safari/537.11")
sparql.setQuery("""PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
?property a wikibase:Property .
OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
}
GROUP BY ?property ?propertyLabel ?propertyDescription
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
dateTimeObj = datetime.now()
print("timestamp: ", print(dateTimeObj))
for result in results["results"]["bindings"]:
p_id = p_label = p_description = p_alt_labels = ""
if result["property"]["value"]:
p_id = result["property"]["value"].rsplit('/', 1)[1]
if result["propertyLabel"]["value"]:
p_label = result['propertyLabel']['value']
# why all these "if"s? Because some properties have no description.
if "propertyDescription" in result:
if result["propertyDescription"]["value"]:
p_description = result['propertyDescription']['value']
if result["altLabel_list"]["value"]:
p_alt_labels = result["altLabel_list"]["value"]
File_object.write(p_id + " | " + p_label + " | " + p_description + " | " + p_alt_labels + "\n")
# simple way to check if Wikidata decided to include a pipe somewhere
for line in File_object:
if line.count('|') > 4:
print("Too many pipes: ", line)
lines = File_object.readlines()
lines.sort()
# TODO: sort through terminal: 'sort wikidata_relation_labels.txt -o wikidata_relation_labels.txt'
File_object.close()
I'm using pipes as seperators. This could under many circumstances be considered bad practice.
I'm trying to obtain the id and label of all Wikidata properties where either the property's label or one of its "also known as" (alternative) labels equals/contains a given string (relation.label).
I'm using this SPARQL client/API (with a somewhat contradicting description) in Python 3.x.
Here's my code snippet:
import sparql
endpoint = 'https://query.wikidata.org/sparql'
def is_simple_relation(relation):
s = sparql.Service(endpoint, "utf-8", "GET")
q = """SELECT DISTINCT ?property ?propertyLabel WHERE {
?property rdf:type wikibase:Property;
rdfs:label ?propertyLabel;
skos:altLabel ?altLabel.
FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
FILTER(CONTAINS(?propertyLabel, "replace_me") || CONTAINS(?altLabel, "replace_me")).
}
LIMIT 100"""
q = q.replace('replace_me', relation.label)
print("Query: ", q)
print("Querying")
result = sparql.query(endpoint, q)
print("Finished query")
for row in result.fetchone():
print("row: ", row)
My output is:
Query: SELECT DISTINCT ?property ?propertyLabel WHERE {
?property rdf:type wikibase:Property;
rdfs:label ?propertyLabel;
skos:altLabel ?altLabel.
FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
FILTER(CONTAINS(?propertyLabel, "has effect") || CONTAINS(?altLabel, "has effect")).
}
LIMIT 100
Querying
Finished query
That means, I'm not retrieving anything. I have tried to perform the query here and it works as expected, so the query is fine. I have tried performing one of the example queries inside my program and it works as expected, printing multiple rows as intended.
The only possible cause I can think of is that the query takes just that much longer when executed from my program that a timeout is reached, whereas the query is evaluated just in time through the second link. But I am not getting a warning or anything. Is my assumption correct? And if so, can my query be improved? There may be a performance killer that I am not aware of.
Thanks!