2

I am using html agility for below task.

I am wondering what can be xpath query to get all the nodes containing a string search term. It should search both attributes and innertext of elements.

<HTML>
 <BODY >
  <H1>Mr T for president</H1>
   <div class="test">We believe the new president should be</div>
   <div id="test">the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>test paragraph</p>
     <p class="test">for Mr T</p>
   </div>
  </BODY>
</HTML>

say I want to get all html elements have test either in their attributes or innertext?

sunder
  • 1,803
  • 4
  • 29
  • 50

2 Answers2

2

To find all element nodes that contain a given token in an attribute value or text node, you can use this:

//*[text()[contains(., 'token')] or @*[contains(., 'token')]]

Be aware that this will fail when the text is interrupted by other markup, for example in <p>foo<em>bar</em></p>.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • that worked after small alteration as you missed a ] there. I have another question - I have this term as my search term !@#$%^&(.txt but for this result comes as null? Why? – sunder Mar 27 '14 at 12:52
  • Thank you for pointing out the typo, fixed that. I don't see any direct issue with your search token, but the ampersand could be a problem. Is it correctly escaped in your HTML input? – Jens Erat Mar 27 '14 at 13:08
  • Yes it is, but I get null result for this search token. It is passed as '!@#$%^&(.txt', where this search token is file name. – sunder Mar 28 '14 at 04:53
  • Try to replace `&` by its entity `&`, and make sure it is also correctly encoded in the input. – Jens Erat Mar 28 '14 at 08:24
0

You can try this XPath to match keyword 'test' against element's inner text or attribute value :

//*[contains(text(), 'test') or @*[contains(., 'test')]]
har07
  • 88,338
  • 12
  • 84
  • 137
  • 1
    This will fail for all elements with multiple text nodes, as `contains($string, $needle)` does only accept single strings as input. – Jens Erat Mar 27 '14 at 13:11