Pure XPath 1.0 solution (no extension functions):
//a[starts-with(@href, 'http://biz.yahoo.com/ic/')
and
substring(@href, string-length(@href)-4) = '.html'
and
string-length
(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.')
) = 3
and
translate(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.'),
'0123456789',
''
)
= ''
]
This XPath expression can be "read in English" like this:
Select any a
in the document, the string value of whose href
attribute starts with the string "'http://biz.yahoo.com/ic/"
and ends with the string ".html"
, and the substring that is between the start and end substrings has length of 3, and this same substring consists only of digits.
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//a[starts-with(@href, 'http://biz.yahoo.com/ic/')
and
substring(@href, string-length(@href)-4) = '.html'
and
string-length
(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.')
) = 3
and
translate(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.'),
'0123456789',
''
)
= ''
]
"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<html>
<body>
<a href="http://biz.yahoo.com/ic/123.html">Link1</a>
<a href="http://biz.yahoo.com/ic/1234.html">Incorrect</a>
<a href="http://biz.yahoo.com/ic/x23.html">Incorrect</a>
<a href="http://biz.yahoo.com/ic/621.html">Link2</a>
</body>
</html>
the XPath expression is evaluated and the selected nodes are copied to the output:
<a href="http://biz.yahoo.com/ic/123.html">Link1</a>
<a href="http://biz.yahoo.com/ic/621.html">Link2</a>
As we see, only the correct, wanted a
elements have been selected.