I'd like to find all text in correct html file. Example:
<div style="color: red;">text<span>another text</span>another text<img src="some_image"/></div>
How can i do that in java?
I'd like to find all text in correct html file. Example:
<div style="color: red;">text<span>another text</span>another text<img src="some_image"/></div>
How can i do that in java?
As pointed out, Regex is a bad idea. I think to parse HTML probably the most well known library is jSoup and a very nice tutorial by MK Yong is here
Try Apache Tika http://tika.apache.org/0.7/gettingstarted.html
Example Using Tika for .html: How can I use the HTML parser with Apache Tika in Java to extract all HTML tags?