0

I have a HtmlNode with InnerHtml:

<a>SomeText</a>
DividerText:
<br>
TextToSelect1
<br/>
TextToSelect2
<br/>
TextToSelect3
<br>
TextToSelect4

It is possible to select all 'TextToSelect' only by XPath without c# Split or Regex?

like this: /text()/substring-after('DividerText:')

Or How can i get InnerHtml that excludes tag a?

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
Bogdan Kolodii
  • 299
  • 3
  • 11

2 Answers2

2

You can get all texts that follow a BR after a DividerText like this (in a sample console app):

  HtmlDocument doc = new HtmlDocument();
  doc.Load(MyTestHtm);

  foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(., 'DividerText:')]/following-sibling::br/following-sibling::text()"))
  {
      Console.WriteLine(node.InnerText.Trim());
  }

Will dump this out:

TextToSelect1
TextToSelect2
TextToSelect3
TextToSelect4

The XPATH expression first gets recursively a text() node that contains a specific 'DividerText:' token, then get all following siblings BR elements, than gets all following sibling text elements.

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
-1

To select all text nodes following in the document:

//text()[contains(., 'DividerText:')]//following::text()

To select all sibling text nodes (following on the same level inside a wrapping element:

//text()[contains(., 'DividerText:')]//following-sibling::text()

If there is some text you need directly after, you would need XPath 2.0, this query also returns the part after the divider string, but needs the substring-after function that is not available in XPath 1.0:

//text()[contains(., 'DividerText:')]//(substring-after(., 'DividerText:'), following::text()/data())

If you're able to use XPath 2.0 or newer, there actually is an substring-after method:

substring-after(string-join(//text()), 'DividerText:')

You could also use //text() to fetch all text nodes and then use some substring-after() equivalent in C#, you might have to concatenate the resulting set/array.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96