I have a html text file that has headings I would like to extract the only the text inside
Example:
<h1 class="title"><a href="dtb.htm#rgn_txt_0001_0001">Fire Safety</a></h1>
<h1><a href="dtb.htm#rgn_txt_0002_0001">About this book</a></h1>
<h1><a href="dtb.htm#rgn_par_0002_0008">1</a></h1>
<h1><a href="dtb.htm#rgn_txt_0003_0001">Contents of this book</a></h1>
I would like extract only the following text from HTML code:
Fire Safety, About this book, 1, Contents of this book
I tried lot of things like:
Pattern pattern = Pattern.compile("<a[^>]href\\s=\\s*\"\\s*([^\"]*)");
Matcher matcher = pattern.matcher(input);
where input is the html data.
Didn't get any results on the console or sometimes are i am getting only href :(
How do I get to fix this?
Let me know! Thanks!