8

I have this HTML/XML:

\t\t\t\t\t    \r\n\t\t
<a href="/test.aspx">
  <span class=test>
    <b>blabla</b>
  </span>
</a>
<br/>
this is the text I want
<br/>
<span class="test">
  <b>code: 123</b>
</span>
<br/>
<span class="test"></span>
\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t

In C#4 I use the HtmlAgilityPack lib to select the Node with XPath and get the InnerText property. This will get all the text inside the node. How can I get only the text "this is the text I want"?

/text() only returns \t\t\t\t\t \r\n\t\t

John Saunders
  • 160,644
  • 26
  • 247
  • 397
peter
  • 243
  • 2
  • 3
  • 9

3 Answers3

15
/div/text()

From the example given, this XPath will get you all text nodes underneath the div element, in this case test2.

If you could elaborate more on the question we might better be able to help you. The Div contains 3 children: a span element, a text node and a b element. The span and b each have a text node child. Using XPath you could select elements only (/div/*), text nodes only (/div/text()) or all node types (/div/node()).

EDIT: /text() will only return you root level text nodes. In this case I would expect it to return a node list containing 3 text nodes:

\t\t\t\t\t    \r\n\t\t 
this is the text I want
\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t

Are you perhaps only selecting the first node in the resultant node list? There are a few issues of well-formedness such as your <br> should probably be <br/>.

Chris Cameron-Mills
  • 4,587
  • 1
  • 27
  • 28
  • Hi, please see my edit. Do you have any idea why it does not return all the text? – peter Oct 06 '10 at 13:44
  • Hi, I was using SelectSingleNode, this is why it was returning only /t/t/t/t/t. I should have used SelectNodes... doh. Thanks – peter Oct 06 '10 at 13:59
  • No probs, glad you got to the bottom of it :) – Chris Cameron-Mills Oct 06 '10 at 14:07
  • 2
    How this answer relates to question? –  Oct 06 '10 at 14:23
  • Oh! Sorry. @peter: don't change the question. Good practice is to ask a new question, otherwise other people will not be beneficiated from the answer. –  Oct 06 '10 at 14:54
  • Incorrect. As mentioned in another comment, the original OP was vague and warranted my original answer. The OP was updated with a more complete fragment. My answer was updated to take this into account and my suggestion (which I have now made bold) that he was not getting the entire nodelist (containing the text he wanted) turned out to be the solution. Thus my answer was accepted. – Chris Cameron-Mills Oct 06 '10 at 15:15
0

How can I get only the text "this is the text I want"?

text()[preceding-sibling::node()[1][self::br]]
      [following-sibling::node()[1][self::br]]

Meaning: the text node between two br elements.

0

@peter: You should not edit your question so that people don't see how the accepted answer relates to the question!!!

The answer to your new question:

/br[1]/following-sibling::text()[1]

selects the wanted text node (the quotes are mine):

"   
this is the text I want   
"
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • What question? I'm not the OP. I suggested an answered to the original (vague) question. The OP updated his question with a more complete fragment of HTML, I updated my answer (see the EDIT: section) to cover the new example. In the end, it wasn't even the XPath that was incorrect but he was picking a single node (the first from the list) instead of the entire nodelist in C# – Chris Cameron-Mills Oct 06 '10 at 15:02