Select and replace multiple lines in Notepad++ using regex

Question

I have a very large HTML file with the results of a security scan and I need to pull the useless information out of the document. An example of what I need to pull out looks something like this:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

After the edit the text above should just be removed. I can't do a standard find due to the variation though. Here is another example of what needs to be removed from the document:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

I need to treat the ID number, 10395, as a variable, but the length stays the same. Also, "Microsoft Windows SMB Shares Enumeration" needs to be treated as a variable too, since it changes throughout the document.

I have tried throwing something like this into replace, but I think I am totally missing the mark.

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>

Maybe I should be using a different tool altogether?

What are you trying to transform to what? What should the doc look like after the change? (and is this a line by line match and replace?) — Tezra, Jun 16 '17 at 17:07
@Tezra I am just trying to remove those snippets, so just replacing them with a space or a \n. It is 6 total lines at a time that would need to be replaced if I approach it the way I am currently thinking. — creigel, Jun 16 '17 at 17:09
So you want to remove the display text portion? Can you please add the example of what it should look like after to the question? — Tezra, Jun 16 '17 at 17:12

Vanity Slug - codidact.com · Answer 1 · 2017-06-16T17:33:09.327

1

Regex in order from least sophisticated to more sophisticated, but all of them get the job done:

<a.*>.*\d.*</a>

<a.*>.*\d{5}.*</a>

<a.*id=\d{5}.*>.*\d{5}.*</a>

Disclaimer: be careful. I can't parse html with regex.

edited Jun 16 '17 at 17:33

answered Jun 16 '17 at 17:17

Vanity Slug - codidact.com

1,347
1
16
31

This worked fantastic for the single line. Thank you for the response. – creigel Jun 16 '17 at 17:32

score 1 · Accepted Answer · answered Jun 16 '17 at 17:22

I assume by repeating \1 multiple times you mean a placeholder for a single character but that's not right. What you are trying to achieve is something like this:

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(\d+)" target="_blank"> \1</a>

To match whole 6 lines:

<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&amp;id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>

Then you can replace it with an empty string.

Thank you so much! Worked like a charm! – creigel Jun 16 '17 at 17:32 — creigel, Jun 16 '17 at 17:32

Select and replace multiple lines in Notepad++ using regex

2 Answers2