1

A sample XML file is:

<book category="lovestory">
    <title lang="en">Сумерки</title>
    <author>Stephanie Meyer</author>
   <year>2003</year>
   <price>50.07</price>
</book>

So far I have something like this XPath string:

xpath.compile("/book[/title='[a-zA-z0-9]+']/author");

How can I get all authors of the English books? (I mean that titles are latin npt cyrillic) (This is Russian)

easydush
  • 13
  • 3
  • John le Carré is English. You can't assume that if something contains letters outside the range a-z then it's not English. – Michael Kay Sep 17 '19 at 22:58

1 Answers1

0

Your XPath is almost correct. Try to use following XPath:

//book//title[@lang='en']//..//author

Explanation:

You select all books which title is in english //book//title[@lang='en'] and take the author of this book //..//author.


If you cannot rely on the lang attribute you can use regex (as you tried in your example). Following XPath uses a regular expression. The matches function is required which is available in XPath 2.0:

//book//title[matches(text(), '[a-zA-z0-9]+')]//..//author

Single slash / means node which is a direct child of the current.

Double slash // means any descendant node of the current node in the html tree which matches the locator.

AndiCover
  • 1,724
  • 3
  • 17
  • 38
  • Note that `[A-z]` matches more than `[a-zA-Z]` https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret – The fourth bird Sep 17 '19 at 19:11
  • 1
    There is also an XPath 1.0 solution: `//book[not(translate(title,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',''))]` – Alejandro Sep 17 '19 at 20:57