9

I'm new to SPARQL and Wikidata for that matter. I'm trying to allow my users to search Wikidata for people, and people only, I don't want any results to be a motorcycle brand or anything.

So I was playing around over here with the following query:

SELECT ?person ?personLabel WHERE {
  ?person wdt:P31 wd:Q5.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?person rdfs:label ?personLabel .
  }
  FILTER regex(?personLabel, "Albert", "i").
}
LIMIT 10

Though this eventually returns a result it is hardly as fast as I'd like it to be. Note that it also just times out if you try the above query with a name that's larger.

All the example queries work with, found here, assume that you already have an entity from which to query from. While in my case you have nothing to go on since I'm trying to query for someone with a certain name. I'm probably making some wrong assumptions about the inner workings of the database I'm working with but I'm not sure what they are though.

Any idea's?

Prowling Duck
  • 417
  • 1
  • 6
  • 14
  • What is the question now? The performance? REGEX over all persons in Wikidata is for sure slow. And as it is a public server, you cannot ensure to have the same "power" for you query all the time. It's a shared service. – UninformedUser Sep 29 '16 at 19:26
  • If you can leave SPARQL for a programmatic solution this looks promising (using node in the browser): https://github.com/cwrc/wikidata-entity-lookup – happybeing Apr 04 '20 at 17:47
  • Here’s an answer using the full text index, that has been added in the meantime: https://stackoverflow.com/a/62126802/4494 – Matthias Winkelmann Aug 07 '21 at 10:13

3 Answers3

9

The problem with doing a free text search with Wikidata is that it does not have a free text index (yet). Without an index text search requires trying a match for each label, which is not efficient. I could not come up with a query that searches for "Albert Einstein" and does not time out. An exact match (?person rdfs:label "Albert Einstein"@en .) does work, of course, but presumably that doesn't fit your needs. It would help if you could narrow down the selection of people in some other way first.

DBpedia (http://dbpedia.org/sparql), on the other hand, has Virtuoso's bif:contains available, so this works quite fast there (http://yasgui.org/short/HJeZ4kjp):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * WHERE {
  ?sub a foaf:Person .
  ?sub rdfs:label ?lbl .
  ?lbl bif:contains "Albert AND Einstein" .
  filter(langMatches(lang(?lbl), "en"))
} 
LIMIT 10
evsheino
  • 2,147
  • 18
  • 20
  • Your DBpedia solution works but the Wikidata issue that you mentioned was closed because someone said we can use MWAPI to run fulltext search over any wiki (including Wikidata). Do you know how to do it? Another solution for Wikipedia would be to search in the "Living people" category like this for people named Nice: https://en.wikipedia.org/w/index.php?search=intitle%3Anice+incategory%3A%22living+people%22&title=Special:Search&profile=advanced&fulltext=1&ns0=1 But unfortunately there is no general people category used on articles to include people that don't live anymore. – baptx Jul 27 '21 at 15:06
8

You can try to use label instead filter:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q5.
  ?item ?label "Einstein"@en .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

See it on Wikidata Query Service

But I'm not sure if you can use wildcards for search.

Alexan
  • 8,165
  • 14
  • 74
  • 101
  • 1
    This is a lot faster, but also requires you have the correct CaSe in your search string. I.e. this is not case insensitive, so be careful. – Andy Mar 30 '17 at 00:05
  • @Andy, yes I know, but sometimes you need case sensitive – Alexan Mar 30 '17 at 00:57
  • There are missing results with this solution, any idea why? For example when searching the first name Nice on https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%0AWHERE%20{%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%20%3Fitem%20%3Flabel%20%22Nice%22%40en%20.%0A%20%20SERVICE%20wikibase%3Alabel%20{bd%3AserviceParam%20wikibase%3Alanguage%20%22[AUTO_LANGUAGE]%2Cen%22.}%0A} it does not displays results like https://www.wikidata.org/wiki/Q22279400. The other answer using DBpedia works. – baptx Jul 27 '21 at 14:51
2

The following query might be what you are looking for

SELECT DISTINCT ?item ?itemLabel ?dateOfBirth 
WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Search";
                    wikibase:endpoint "www.wikidata.org";
                    mwapi:srsearch "Franz Kafka haswbstatement:P31=Q5".
    ?item wikibase:apiOutputItem mwapi:title .
  }
  OPTIONAL {?item wdt:P569 ?dateOfBirth . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

whereas also see https://www.wikidata.org/wiki/Wikidata:Request_a_query#How_to_query_for_people_by_first_and%2For_last_name%3F