2

I want to query a string "Merkel".

I want the output to be something like:

  ID         instance_of   LABEL          
 ---------- ------------- --------------- 
  Q567       Q5            Angela Merkel  
  Q1921787   Q101352       Merkel         
  Q969485    Q1093829      Merkel         

This is what I have until now. Click here to see on WIkidata Query

SELECT ?instance_of ?s ?p ?o ?label WHERE {
?s ?label "Merkel"@de.
?s ?p ?o
OPTIONAL { ?s wdt:P31 ?instance_of. }}

I want the search results to be ranked, most popular/relevant to least relevant. I have no clue on how to do that.

On the Wikidata website, they do it when you search for a term, perhaps a order by statements and sitelinks is a possible solution.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
pajamas
  • 1,194
  • 1
  • 12
  • 25
  • you need some kind of ranking measure first in the dataset – UninformedUser Sep 29 '18 at 13:26
  • Well, and indeed this was already asked before (with a great answer by the way): https://stackoverflow.com/questions/39438022/wikidata-results-sorted-by-something-similar-to-a-pagerank – UninformedUser Sep 29 '18 at 13:28
  • 2
    Possible duplicate of [Wikidata results sorted by something similar to a PageRank](https://stackoverflow.com/questions/39438022/wikidata-results-sorted-by-something-similar-to-a-pagerank) – UninformedUser Sep 29 '18 at 13:28
  • I am looking for items in the alias. At this point ranked by the number of sitelinks will do. – pajamas Sep 29 '18 at 20:32
  • Ok, then just reuse the query in the answer I referred to. Or does it not work? – UninformedUser Sep 30 '18 at 03:22
  • How do I combine "rank by" with "search in alias"?The stuff from the link you mentioned does not work. – pajamas Sep 30 '18 at 10:33
  • What? Works as expected, e.g. ordered by incoming links: `SELECT ?s ?instance_of WHERE { ?s rdfs:label "Merkel"@de. ?s wdt:P31 ?instance_of. { SELECT (count(?s) AS ?incoming) (?item as ?s) WHERE { ?item rdfs:label "Merkel"@de. ?s ?p ?item . [] wikibase:directClaim ?p } GROUP BY ?item } } ORDER BY DESC (?incoming)` – UninformedUser Sep 30 '18 at 12:36
  • Is there a way to implement a fuzzy match to the keyword, something like:´´´FILTER (CONTAINS(LCASE(STR(?label)), "Merkel"))´´´, because now the query only searches for an item that exactly matches the query. – pajamas Oct 01 '18 at 09:38
  • well, either the `filter(contains(...` as in your comment or `filter(regex(...` - indeed both might fail due to timeout given that we don`t have any fulltext index, thus, have to scan over the whole dataset – UninformedUser Oct 01 '18 at 12:53
  • Where exactly do I put it in the above query? I keep getting a ``"Query is malformed: Encountered " "filter""`` error – pajamas Oct 01 '18 at 14:01
  • just put the filter expression inside the inner SELECT and the out SELECT. and indeed don't forget to replace `"Merkel"@de` with `?label` – UninformedUser Oct 01 '18 at 16:07
  • 1
    Full-text search is another interesting question... If you just need a ready-to-use query that gives results ordered as in the Wikidata search page (and not as in the autosuggestions), you can use Wikidata API. It is possible to call out this API from SPARQL. https://pastebin.com/DDExvaGP – Stanislav Kralin Oct 01 '18 at 16:15
  • @Stanislav Kralin Thanks! This exactly what I wanted. Could you put your answer below, then I can accept it. – pajamas Oct 02 '18 at 09:30
  • 1
    @Pagamas, thank you. You could better upvote this community wiki answer: https://stackoverflow.com/a/52310764/7879193 :-) BTW, it seems that the Wikidata full-text search API is slower than the Wikipedia one. – Stanislav Kralin Oct 02 '18 at 09:38
  • yes, it's going to take ages – pajamas Oct 02 '18 at 09:48
  • @pajamas, there are undocumented tricks: https://phabricator.wikimedia.org/T177275#4631207 – Stanislav Kralin Oct 02 '18 at 12:43

0 Answers0