-1

I'd like to find all text in correct html file. Example:

<div style="color: red;">text<span>another text</span>another text<img src="some_image"/></div>

How can i do that in java?

Nyger
  • 215
  • 1
  • 2
  • 7

2 Answers2

0

As pointed out, Regex is a bad idea. I think to parse HTML probably the most well known library is jSoup and a very nice tutorial by MK Yong is here

lazyprogrammer
  • 633
  • 9
  • 26
0

Try Apache Tika http://tika.apache.org/0.7/gettingstarted.html

Example Using Tika for .html: How can I use the HTML parser with Apache Tika in Java to extract all HTML tags?

Community
  • 1
  • 1
Dimitar Pavlov
  • 300
  • 3
  • 10