3

I'm looking for a way to match claims values in a case insensitive way.

For instance, Wikidata has the following statement:

wd:Q1524522 wdt:P2002 'bouletcorp'

but, following the case used in twitter, let's say that I would use the value Bouletcorp instead, which would give the following query, and fail to find any matching entity:

SELECT ?item WHERE {
  ?item wdt:P2002 "Bouletcorp" .
}

(try it)

A solution could be to use a Regex with an insensitive case flag as follow:

SELECT ?item WHERE {
  ?item wdt:P2002 ?twittername .
  FILTER (regex(?twittername, "Bouletcorp", "i"))
}

(try it)

but how much less efficient would this query be? Isn't there a better way? As I understand it, this query would make the SPARQL engine pass all triples having a value for the requested property through a regex, which does sound inefficient. It's not that slow yet for P2002, but I guess some properties having more than a million matching claims could be problematic, no?

honk
  • 9,137
  • 11
  • 75
  • 83
maxlath
  • 1,804
  • 15
  • 24
  • 1
    There is nothing "better" in the SPARQL standard. Of course REGEX will use a scan + applying the regex over all values which can be slow. Depending on the triple store, there might be some built-ins resp. extension for fulltext search. As far as I know, Wikidata is hosted in a Blazegraph triple store, thus,you might have a look at the documentation. – UninformedUser Mar 28 '17 at 18:24
  • 2
    It doesn't avoid the "touch every triple" case, but you could do a `?s ?p ?v filter (lcase(?v) = 'the string')`, which wouldn't require a whole regular expression comparison to be introduced. Plus, you wouldn't have to worry about whether the string you're looking for happens to contain regular expression sensitive characters. – Joshua Taylor Mar 28 '17 at 21:29

0 Answers0