I. You may do something like this in XSLT 2.0:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="s">
<xsl:variable name="vWords" select=
"tokenize(lower-case(string(.)),
'[\s.?!,;—:\-]+'
) [.]
"/>
<xsl:sequence select=
" for $current in .,
$i in 1 to count($vWords)
return
if($vWords[$i] eq 'blood'
and
$vWords[$i+1] eq 'pressure'
)
then .
else ()
"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
When this XSLT 2.0 transformation is applied to the following XML document (no such document provided in the question!!!):
<t>
<s>He has high blood pressure.</s>
<s>He has high Blood Pressure.</s>
<s>He has high Blood
Pressure.</s>
<s>He was coldblood Pressured.</s>
</t>
the wanted, correct result (only elements containing `"blood" and "pressure" (case-insensitive and as two adjacent words) is produced:
<s>He has high blood pressure.</s>
<s>He has high Blood Pressure.</s>
<s>He has high Blood
Pressure.</s>
Explanation:
Using the tokenize()
function to split on strings of nn-letter characters, with flags for case-insensitivity and multi-line mode.
Iterating through the result of tokenize()
to find a "blood"
word followed immediately by a "pressure"
word.
II. An XSLT 1.0 solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vUpper" select=
"'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="vLower" select=
"'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="vSpaaaceeees" select=
"' '
"/>
<xsl:variable name="vAlpha" select="concat($vLower, $vUpper)"/>
<xsl:template match="s">
<xsl:variable name="vallLower" select="translate(., $vUpper, $vLower)"/>
<xsl:copy-of select=
"self::*
[contains
(concat
(' ',
normalize-space
(translate($vallLower, translate($vallLower, $vAlpha, ''), $vSpaaaceeees)),
' '
),
' blood pressure '
)
]
"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
when this transformation is applied on the same XML document (above), the same correst result is produced:
<s>He has high blood pressure.</s>
<s>He has high Blood Pressure.</s>
<s>He has high Blood
Pressure.</s>
Explanation:
Converting to lowercase.
Using the double-translate method to replace any non-alpha character to a space.
Then using normalize-space()
to replace any group of adjacent spaces with a single space.
Then surrounding this result with spaces.
Finally, verifying if the current result contains the string " blood pressure "
.