I have a very large HTML file with the results of a security scan and I need to pull the useless information out of the document. An example of what I need to pull out looks something like this:
<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>
After the edit the text above should just be removed. I can't do a standard find due to the variation though. Here is another example of what needs to be removed from the document:
<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>
I need to treat the ID number, 10395, as a variable, but the length stays the same. Also, "Microsoft Windows SMB Shares Enumeration" needs to be treated as a variable too, since it changes throughout the document.
I have tried throwing something like this into replace, but I think I am totally missing the mark.
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>
Maybe I should be using a different tool altogether?