0

I have the query shown below:

SELECT DISTINCT ?dataset ?title WHERE { 

      ?dataset a dcat:Dataset ; 
      dcterms:title ?title ; 
      dcterms:description ?description .

      { ?dataset dcterms:title ?title . 
        ?title bif:contains "'keyword_1'" }        
      UNION
      { ?dataset dcterms:description ?description . 
        ?description bif:contains "'keyword_1'" }

      { ?dataset dcterms:title ?title . 
        ?title bif:contains "'keyword_2'" }
      UNION
      { ?dataset dcterms:description ?description . 
        ?description bif:contains "'keyword_2'" }
    }

Semantically, this query is supposed to return all datasets which have "keyword_1" in either their "title" or "description" (this is the first UNION clause) and "keyword_2" in either their "title" or "description" (second UNION clause). The intent is to intersect these two UNION clauses together, that is, getting only only those datasets which fulfill both clauses.

This validator tells me that the query is syntactically correct. However, when sending the query to Virtuoso, the following error is returned:

Virtuoso 37000 Error SP031: SPARQL compiler: Internal error: sparp_find_triple_with_var_obj_of_freetext(): lost connection between triple pattern and an ft predicate


SPARQL query:
define sql:big-data-const 0 

output-format:text/html<br>
define sql:signal-void-variables 1 

Do you have an idea whats going on? I don't get what Virtuoso is trying to tell me when stating "lost connection between triple pattern and an ft predicate"...

Thanks in advance!

fritten
  • 3
  • 2
  • Could be a bug within the optimizer in combination with the fulltext index. But, for better support the devs like TallTed (he'll respond here for sure), you should provide the Virtuoso version you're using. – UninformedUser Mar 17 '19 at 14:19

1 Answers1

0

Maybe a bug in the query executor or optimizer. The Virtuoso experts like TallTed know better and will give you support. I can at least reproduce this on e.g. https://www.europeandataportal.eu/sparql which runs on Virtuoso version 07.20.3230 on Linux (x86_64-unknown-linux-gnu), Single Server Edition.

But, more important: your query looks way too complex as you could use a FILTER with logical || in combination with && - at least that's what I thought.

Unfortunately, it fails with an error

Virtuoso 37000 Error SP031: SPARQL compiler: No suitable triple pattern is found for a variable $description in special predicate bif:contains() at line 7 of query

and neither

SELECT DISTINCT ?dataset ?title WHERE { 
  ?dataset a dcat:Dataset ; 
  dcterms:title ?title ; 
  dcterms:description ?description .
  filter( (bif:contains(?title, "'keyword_1'") || bif:contains(?description,"'keyword_1'")) 
            && 
          (bif:contains(?title, "'keyword_2'") || bif:contains(?description,"'keyword_2'"))
  )   
}

nor

SELECT DISTINCT ?dataset ?title WHERE { 
  ?dataset a dcat:Dataset ; 
  dcterms:title ?title ; 
  dcterms:description ?description .
  filter(bif:contains(?title, "'keyword_1'") || bif:contains(?description,"'keyword_1'"))
  filter(bif:contains(?title, "'keyword_2'") || bif:contains(?description,"'keyword_2'"))         
}

do work as I'd expect.

(Verbose) workaround using subqueries:

SELECT DISTINCT ?dataset ?title WHERE { 
 {
  select ?dataset ?title { 
  ?dataset a dcat:Dataset ; 
           dcterms:title ?title ; 
           dcterms:description ?description .
  filter( bif:contains(?title, "'keyword_1'") || bif:contains(?description,"'keyword_1'")) 
  }
 }
 {
  select ?dataset ?title { 
  ?dataset a dcat:Dataset ; 
           dcterms:title ?title ; 
           dcterms:description ?description .
  filter( bif:contains(?title, "'keyword_2'") || bif:contains(?description,"'keyword_2'"))
  } 
 }     
}
UninformedUser
  • 8,397
  • 1
  • 14
  • 23
  • Thanks for your input. In fact I am also using the European Data Portal. If I am not mistaken, your suggested query is semantically slightly different. Mine would also match if both keyword_1 and keyword_2 appear in a dataset's description, where as yours wouldn't. I tried to model this behaviour using the CONTAINS function in combination with FILTER, but these queries reliably ran into timeouts as soon as more than one keyword was involved. Hence my attempt to use UNION, with the hope that this would run faster – fritten Mar 17 '19 at 14:45
  • My original query: ` SELECT DISTINCT ?dataset ?title WHERE { ?dataset a dcat:Dataset ; dcterms:title ?title ; dcterms:description ?description FILTER ( ( contains(str(?title), "keyword_1") || contains(str(?description), "keyword_1") ) && ( contains(str(?title), "keyword_2") || contains(str(?description), "keyword_2" ) ) } ` – fritten Mar 17 '19 at 14:52
  • Right, I'm dumb and did not read your query carefully. Regarding CONTAINS, it's slower as it doesn't touch the fulltext index thus a scan over the whole dataset is done and the string operation done on each string. Unfortunately the `bif:contains` fails in the FILTER. – UninformedUser Mar 17 '19 at 15:10
  • Just to clarify, you want that both keywords are in either title or description or both, right? – UninformedUser Mar 17 '19 at 15:12
  • Exactly, both keywords need to appear somewhere, either both in the title, both in the description, or one in the title and one in the description. I read that bif:contains is much faster than the regular CONTAINS as it utilizes some sort of precomputed index, but the only way I could come up with to express the desired semantics was using UNION. I just tried your subquery workaround, and it works flawlessly. Thanks a bunch! I would still be curious as to why the UNION query is rejected by Virtuoso... – fritten Mar 17 '19 at 19:54
  • 1
    I guess it just has some issues with the fulltext index used in non trivial clauses as I've also shown with the simple `FILTER(... && ...)` which fails. You should open a [Github issue](https://github.com/openlink/virtuoso-opensource/issues). Maybe the devs are already aware of it and if not at least they would know it now – UninformedUser Mar 18 '19 at 07:10
  • Thanks for the suggestion, I just opened a new [issue](https://github.com/openlink/virtuoso-opensource/issues/835) – fritten Mar 20 '19 at 16:20