Update: For those who mark this question as a duplicate of: I am searching for text which may be included in just one element or may be spread over a 100 elements. I do not know that prior to search. All I know is the words within the pattern that I'm searching came from this html. Now I need to do a search which skips (but remembers) the html/javascript which may be interdisperesed with my the text I'm looking for.
I hope this explanation helps find an answer to my question.
*********** End of Update ***************
I am looking for a library or a piece of code that would allow arbitrary plain text to be searched and located (start/stop offsets or tags) inside an html document.
Example:
- pattern to look for: "text that I'm looking for"
- html document:
<html>...<p>text that <b>I'm</b/> <span>looking for<div>...</div>...</p>
- resulting match:
text that <b>I'm</b/> <span>looking for
Does anyone know of such utility? thanks