11

My question is how I can extract all properties and their respective labels that are also rendered on the webpage from wikidata preferably over SPARQL.

Take for example the Google entry on wikidata. For the property P414 (stock exchange) or P159 there are subproperties like P969 (located at street address). They actually show up once you query wbgetentities as qualifieres. The problem with wbgetentities is that the labels are missing. I get the desired output (e.g. wdt:P17 => country => United States of America) with the following SPARQL query:

SELECT ?prop_id ?prop_label ?prop_val_label WHERE {
  VALUES (?company) {
    (wd:Q95)
  }
  ?company ?prop_id ?company_item.
  ?wd wikibase:directClaim ?prop_id.
  ?wd rdfs:label ?prop_label.
  OPTIONAL {
    ?company_item rdfs:label ?prop_val.
    FILTER((LANG(?prop_val)) = "en")
  }
  BIND(COALESCE(?prop_val, ?companyItem) AS ?prop_val_label)
  FILTER((LANG(?prop_label)) = "en")
}

But those "subproperties" are missing because they are not under direct claims. To extract a single statements qualifier I can do:

SELECT ?company ?hq ?country WHERE {
  wd:Q95 p:P159 ?company.
  OPTIONAL {
    ?company ps:P159 ?hq.
    ?company pq:P17 ?country. 
  }
}

But the question is if there is a way to combine everything to one query?

MrKaikev
  • 215
  • 3
  • 10

1 Answers1

21

Useful links on the Wikidata data model:

Your query should be of this kind:

SELECT ?wdLabel ?ps_Label ?wdpqLabel ?pq_Label {
  VALUES (?company) {(wd:Q95)}

  ?company ?p ?statement .
  ?statement ?ps ?ps_ .

  ?wd wikibase:claim ?p.
  ?wd wikibase:statementProperty ?ps.

  OPTIONAL {
  ?statement ?pq ?pq_ .
  ?wdpq wikibase:qualifier ?pq .
  }

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ?wd ?statement ?ps_

Try it!

Only qualifiers and their values are included in the result. Neither provenance references nor value annotations (e.g. time precision) are included. Please write a comment if you need to add them.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
  • A question about the SPARQL snippet: Would it be possible to add the entity nodes object ID for all of the returned data. To be precise: For your example with Google: wdLabel | founded by ps_Label | Larry Page would it be possible to return also the ID of ps_Label and pq_Label (if they are entities)? – fabmeyer Sep 02 '23 at 09:20
  • 1
    @fabmeyer, like this: https://w.wiki/7N$B ? – Stanislav Kralin Sep 02 '23 at 09:39
  • Genius! Where can I learn that properly? – fabmeyer Sep 02 '23 at 10:14
  • @fabmeyer, that's basic SPARQL. The label service is WDQS extensions. More info on Wikibase RDF dump format: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format – Stanislav Kralin Sep 02 '23 at 10:58