I want to implement in desktop application in java searching and highlighting multiple phrases in html files, like it is done in web browsers, so html tags (within <
and >
) are ignored but some tags like <b>
arent ignored. When searching for example each table
in text ...each <b>table</b> has name...
will be highlighted, but in text ...has each</p><p> Table is...
it will be not highlighted, because the <p>
tag interrupts the text meaning.
in web browser is this somehow implemented, how can I get to this implementation? or is there some source on the net? I tried google, but without success :(

- 2,383
- 1
- 32
- 44
4 Answers
Instead of searching inside the actual HTML file the browsers search on the rendered output of that HTML.
Get a suitable HTML renderer and get its output as text. Then search on that text output using appropriate string searching algorithms.
The example that you highlighted in your question would result in a newline character in the rendered HTML output and hence a normal string searching algorithm will behave as you expect.

- 12,458
- 4
- 40
- 51
-
+1 thanks so far the best answer, but I want an algorithm to do this somehow in desktop app... I dont believe that nobody tried this ever :) – Zavael Sep 16 '10 at 05:55
As Faisal stated, browsers search in rendered content only. For doing so you'll need to remove the HTML tags before doing the actual search:
This code might help you: http://www.dotnetperls.com/remove-html-tags
Of course you'll need to add some checks/exclusions like script tags and other things that are not rendered into the browser.

- 1,948
- 1
- 14
- 21
This seems pretty easy.
1) Search for the last word in the string.
2) Look at what's before the last word.
3) Decide if what's before the last word constitutes and interruption (<p>, <br />, <div>
).
4) If interruption, continue
5) Else
evaluate previous word against the search query.
I don't know if this is how browsers perform this operation, but this approach should work.

- 871
- 11
- 22
-
so you suggest to "split" the html text into some pure text parts and then apply the searching within these parts? or did I misunderstand you? – Zavael Sep 16 '10 at 05:53
Try using javax.swing.text.html package in java.

- 1,166
- 3
- 16
- 40
-
i know its old question, but could you share more info or an example on how to use it for future visitors? – Zavael Aug 05 '15 at 09:18