5

My XQuery is:

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return $attr

return: name="city" name="city" name="city" name="city" name="city"

When I add distinct-values like:

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return distinct-values($attr)

return: city city city city city

I need only one "city", how can I do it ?

Christian Grün
  • 6,012
  • 18
  • 34
Peter Fašianok
  • 153
  • 1
  • 1
  • 8

3 Answers3

8

You need to apply the distinct-values function on the whole result (i. e., not to each single result item):

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
distinct-values(
  for $schema in xsd:schema
  for $nodes in $schema//*,
      $attr in $nodes/xsd:element/@name
  where fn:contains($attr,'city')
  return $attr
)

The query can also be written as a single XPath expression:

distinct-values(//xs:element/@name[contains(., 'city')])
Christian Grün
  • 6,012
  • 18
  • 34
  • is there a way to speed this up a bit? I am facing the problem that baseX is pretty slow when using _distinct-values()_ or the _group by_ statement. without, the xquery is finished correctly in 4s. with elimination of double values, however, it takes about 4 minutes. the values are about 60.000 text nodes; each contains a lemma. 1G memory is available for each queries. the only way of speeding this up right now is the work-around to let another language remove the double values...now what I want, actually – meistermuh Sep 13 '19 at 07:37
  • In invite you to send this to the BaseX mailing list, and add further information on your use case, the specific query, etc. – Christian Grün Sep 13 '19 at 11:08
  • thanks, I already joined the list. As for the problem mentioned above, I found out that there is a huge difference between case-sensitive and case-insensitive grouping: the latter is slow. So my *solution* was to use the default collation instead of my previously used "html-ascii-case-insensitive" – meistermuh Sep 24 '19 at 13:37
4

Use group by. Your query returns multiple times city, because in each iteration (of the for loop) there is only one such element in $attr. So you are doing the distinct-values on a single element, but you are doing this multiple times.

declare namespace xsd="http://www.w3.org/2001/XMLSchema"; 
for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
group by $attr
return $attr
dirkk
  • 6,160
  • 5
  • 33
  • 51
0

This work

distinct-values(for $schema in xsd:schema
for $nodes in $schema//*,
    $attr in $nodes/xsd:element/@name
where fn:contains($attr,'city')
return distinct-values($attr))
Peter Fašianok
  • 153
  • 1
  • 1
  • 8