I have a large html. I want to remove a specific span tag which can be straightforward as.
<span class=GramE> blah blah blah</span>
Output: bla bla bla
OR
<span class=a><span class=GramE>bla bla bla</span></span>
Output: <span class=a>bla bla bla</span>
Or in any other intermingled format. However, it should preserve the text between ...
Actual html
<td width=265 colspan=3 valign=top style='width:7.0cm;background:white;
padding:0cm 5.75pt 0cm 5.75pt'> <p class=MsoNormal style='margin-bottom:0cm;margin-bottom:.0001pt;text-align:justify;line-height:normal'><span class=GramE><span style='font-size:13.0pt'>(Here</span></span><span style='font-size:13.0pt'> Lorem ispsum. Lorem ispsum. Lorem ispsum. Lorem ispsum )</span></p>
</td>
I have tried the following code however, the replaceAll() doesnt seem to work. There are many intermingled span-tags in my html text which need this output. Please help me figure out where am I going wrong.
String filename = "file-location.html";
try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
String line;
String sb = "";
while ((line = br.readLine()) != null) {
String tmp = line.replaceAll("<span class=GramE[^>]*>/g", "");
System.out.print(tmp);
}
} catch (IOException e) {
e.printStackTrace();
}