12

I'm trying to understand how best to handle literals in Marklogic SPARQL data which may be in any case. I'd like to be able to do a case insensitive search but I believe that isn't possible with semantic queries. For a simplistic example I want:

SELECT *
WHERE { ?s ?p "Red"}

and

SELECT *
WHERE { ?s ?p "red"}

to return all values whether the object is "Red", "RED", "red" or "rED".

My data is from another source which has variable capitalisation rules. At the moment the only thing I can think of is to add an extra triple which always contains the text in lower case so I can always search on that value. Alternatively, would it make sense to create some new range query in MarkLogic with a case insensitive collation (if that's possible on triple data)?

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Millstone1998
  • 150
  • 1
  • 8
  • possible duplicate of [How to write SPARQL query that efficiently matches string literals while ignoring case](http://stackoverflow.com/questions/10660030/how-to-write-sparql-query-that-efficiently-matches-string-literals-while-ignorin) – Joshua Taylor Dec 02 '14 at 22:21

1 Answers1

12

You could use a filter that ignores case.

select * where {
  ?s ?p ?o
  FILTER (lcase(str(?o)) = "red")
}

Based on the answer to another question.

Edit: I asked Steve Buxton, MarkLogic's PM for semantics features, and he suggested this:

let $store := sem:store( (), cts:element-value-query(xs:QName("sem:object"), "red", "case-insensitive") )
return
  sem:sparql('
    SELECT ?o
    WHERE {
      ?s ?p ?o
      FILTER (lcase(str(?o)) = "red")
    }', (), (), $store
 )

sem:store is a MarkLogic 8 (now available through Early Access) function that selects a group of triples. The SPARQL query then runs on the reduced set, limiting the number of triples that need to be filtered.

Community
  • 1
  • 1
Dave Cassel
  • 8,352
  • 20
  • 38
  • 3
    You can also use [cts:contains](http://docs.marklogic.com/cts:contains). A string query (2nd-argument) is coerced to a [cts:word-query](http://docs.marklogic.com/cts:word-query), which is case-insensitive for lower-case strings. – joemfb Dec 02 '14 at 19:49
  • 3
    @joemfb good suggestion, though it should be pointed out that those are product-specific extensions that are not part of the SPARQL standard, and so queries using them will not be portable to other SPARQL stores. – Jeen Broekstra Dec 02 '14 at 21:56
  • Plus one as this is a good solution. Minus one because if it's based on another answer, there's a good chance that you should just flag the question as a duplicate. There's nothing the matter with a duplicate question, and it's better to have a canonical answer rather than lots of very similar questions floating around. – Joshua Taylor Dec 02 '14 at 22:23
  • It's the same question asked about different platforms, which allows for potentially different answers. – Dave Cassel Dec 03 '14 at 12:03