3

I am seeing an issue where an cts:element-word-query fails to select any items if there are more than two words (including possessive “’s") in the search term. This is happening on our production server running 7.0-4.3, but not on our development server running 7.0-5.4.

Comparing the results of pkg:database-configuration() does not show any clear reason why this would be happening.

The following xquery:

for $x in ((//ch_firstSource)[1 to 10])
let $q := cts:element-word-query(xs:QName('ch_firstSource'), (string($x)))
return (
    $x, 
    xdmp:estimate(cts:search(collection(),$q)), 
    cts:highlight($x, $q, element hit {$cts:text})
)

Produces the following result in production:

<ch_firstSource>Authentic Copy New Constit. France</ch_firstSource>
0
<ch_firstSource><hit>Authentic Copy New Constit. France</hit></ch_firstSource>
<ch_firstSource>Facsimiles National MSS Scotl.</ch_firstSource>
0
<ch_firstSource><hit>Facsimiles National MSS Scotl.</hit></ch_firstSource>
<ch_firstSource>Geoffrey Chaucer</ch_firstSource>
50900
<ch_firstSource><hit>Geoffrey Chaucer</hit></ch_firstSource>
<ch_firstSource>Thomas Newton</ch_firstSource>
1771
<ch_firstSource><hit>Thomas Newton</hit></ch_firstSource>
<ch_firstSource>Apocalypse St. John: A Version</ch_firstSource>
0
<ch_firstSource><hit>Apocalypse St. John: A Version</hit></ch_firstSource>
<ch_firstSource>Apocalypse St. John: A Version</ch_firstSource>
0
<ch_firstSource><hit>Apocalypse St. John: A Version</hit></ch_firstSource>
<ch_firstSource>Stephen Hawes</ch_firstSource>
2117
<ch_firstSource><hit>Stephen Hawes</hit></ch_firstSource>
<ch_firstSource>Stephen Hawes</ch_firstSource>
2117
<ch_firstSource><hit>Stephen Hawes</hit></ch_firstSource>
<ch_firstSource>Bede's Ecclesiastical History</ch_firstSource>
0
<ch_firstSource><hit>Bede's Ecclesiastical History</hit></ch_firstSource>
<ch_firstSource>Pseudo-Apuleius' Herbarium</ch_firstSource>
0
<ch_firstSource><hit>Pseudo-Apuleius' Herbarium</hit></ch_firstSource>

A larger set includes:

<ch_firstSource>R. Whitford</ch_firstSource>
411
<ch_firstSource><hit>R. Whitford</hit></ch_firstSource>

and

<ch_firstSource>William Durrant Cooper</ch_firstSource>
0
<ch_firstSource><hit>William Durrant Cooper</hit></ch_firstSource>

On dev, the same query produces:

<ch_firstSource>Thomas Newton</ch_firstSource>
497
<ch_firstSource>
 <hit>Thomas Newton</hit>
</ch_firstSource>
<ch_firstSource>Stephen Marshall</ch_firstSource>
88
<ch_firstSource>
 <hit>Stephen Marshall</hit>
</ch_firstSource>
<ch_firstSource>Secreta Secretorum</ch_firstSource>
425
<ch_firstSource>
 <hit>Secreta Secretorum</hit>
</ch_firstSource>
<ch_firstSource>New Scientist</ch_firstSource>
421
<ch_firstSource>
 <hit>New Scientist</hit>
</ch_firstSource>
<ch_firstSource>Quarterly Review</ch_firstSource>
1226
<ch_firstSource>
 <hit>Quarterly Review</hit>
</ch_firstSource>
<ch_firstSource>Thomas Davis</ch_firstSource>
50
<ch_firstSource>
 <hit>Thomas Davis</hit>
</ch_firstSource>
<ch_firstSource>Arthur Young</ch_firstSource>
473
<ch_firstSource>
 <hit>Arthur Young</hit>
</ch_firstSource>
<ch_firstSource>William Durrant Cooper</ch_firstSource>
14
<ch_firstSource>
 <hit>William Durrant Cooper</hit>
</ch_firstSource>
<ch_firstSource>Westminster Gazette</ch_firstSource>
2629
<ch_firstSource>
 <hit>Westminster Gazette</hit>
</ch_firstSource>
<ch_firstSource>Deb. Congress 1808</ch_firstSource>
1
<ch_firstSource>
 <hit>Deb. Congress 1808</hit>
</ch_firstSource>

Does anyone have any ideas why this might be happening?

Ankit Bhardwaj
  • 754
  • 8
  • 27
  • A little tricky to pinpoint the cause without knowing the exact database config. Then again, SO might not be the best channel for such in-depth investigation. – grtjn Mar 17 '17 at 18:55
  • Though, there have been a few bug fixes in the area of word queries between 7.0-4.3 and 7.0-5.4, so maybe it was just a bug that got fixed.. – grtjn Mar 17 '17 at 18:56
  • That's useful to know. Thanks. – Marc Moskowitz Mar 17 '17 at 19:48

1 Answers1

0

To look at the published bug fixes between versions 7.0-4.3 and 7.0-5.4, go to https://help.marklogic.com/Bugtrack/List and enter those versions in the From and To fields, then click Show. I don't see any that match your case, but it's worth a look.

I had a theory until the "larger set" data -- in your initial examples, all the greater-than-two-word examples also had punctuation. (Your larger set results appear to be a contrary example.) Just in case, could you run

for $x in ((//ch_firstSource)[1 to 10])
let $q := cts:element-word-query(xs:QName('ch_firstSource'), (string($x)))
return (
    $x, 
    xdmp:estimate(cts:search(collection(), $q, "punctuation-insensitive")), 
    cts:highlight($x, $q, element hit {$cts:text})
)
Dave Cassel
  • 8,352
  • 20
  • 38
  • Correct, adding "punctuation-insensitive" to the query (: not the search :) does not fix the issue. I'll take a look at those bugs. – Marc Moskowitz Mar 23 '17 at 18:35