Trouble using Xpath "starts with" to parse xhtml

Question

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format

<div id="post_message_somenumber">

and I only want to get the first one

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions

Good question, +1. See my answer for two possible causes of the problem and for solution. — Dimitre Novatchev, Feb 01 '11 at 05:32
The problem is with quotes and (perhaps secondarily) the value of the `id` (it doesn't start with a double quote). You want something like `xpath='//div[starts-with(@id, "post_message_")]'` — salathe, Feb 01 '11 at 07:46
I don't know what yql is, but I suspect the issue is with how you write an XPath expression containing quotes and then embed it or escape it in your host language environment. — Michael Kay, Feb 01 '11 at 09:45
thanks for the responses. Salathe, your suggestion worked. YQL is yahoo query language and, along with yahoo pipes, is a good way for people who don't know programming to learn how to parse web pages, combine rss feeds, etc. — bigbucky, Feb 01 '11 at 20:53

Vimes · Answer 1 · 2012-08-17T23:28:18.063

I think I have a solution that does not require dealing with namespaces.

Here is one that selects all matching div's:

//div[@id[starts-with(.,"post_message")]]

But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:

(//div[@id[starts-with(.,"post_message")]])[1]

These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.

It works great for me in PowerShell:

# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'

# Run the xpath selection of all matching div's
$xml.selectnodes('//div[@id[starts-with(.,"post_message")]]')

Result:

id
--
post_message_somenumber
post_message_somenumber2

Or, for just the first match:

# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[@id[starts-with(.,"post_message")]])[1]')

Result:

id
--
post_message_somenumber

score 5 · Answer 2 · edited May 23 '17 at 12:30

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success I'm still learning this, anyone have suggestions

If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.

Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.

Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.

Here is a good answer how to do this in C#.

Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.

score 1 · Answer 3 · edited Dec 14 '16 at 15:42

1

@FindBy(xpath = "//div[starts-with(@id,'expiredUserDetails') and contains(text(), 'Details')]") 
private WebElementFacade ListOfExpiredUsersDetails;

This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

edited Dec 14 '16 at 15:42

bofredo

2,348
6
32
51

answered Dec 14 '16 at 13:39

jaxy

37
1

Trouble using Xpath "starts with" to parse xhtml

3 Answers3