How to remove HTML code from mulitple pages using regular expressions from Notepad++

Question

I am trying to figure out how to remove this piece of example HTML code from my website. The problem is that the code in between “<p style="flex: 0 0 auto; margin:0; padding-right:1rem;” and the end of the p style code change. I need to use regular expressions from Notepad++ to remove this quickly for my organization. Thank you for the help!!

<p style="flex: 0 0 auto; margin:0; padding-right:1rem;">Hi my name is <a href="https://facebook.com" style="color:#aaa;">Facebook</a> haha</p>

You might want to (try to) read this answer: [You can't parse (X)HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML (...)](https://stackoverflow.com/a/1732454/1220550) (and it goes on for a while) — Peter B, Nov 08 '21 at 00:40

score 1 · Answer 1 · answered Nov 08 '21 at 01:54

HTML is too complex to be parsed with regular expressions, so you won't be able to do this unless there are certain assumptions you can make about the code you're looking for that will allow it to be found with a regular expression.

Because <p> tags can't be nested inside one another, you should be able to guarantee that everything between your opening tag and the next instance of <p> will be what you want to remove, even if it contains nested HTML tags inside like that <a> tag.

That is, of course, assuming this HTML code isn't invalid in that particular way. And also assuming that there aren't any HTML comments in here that might contain that text.

By default, regular expressions are greedy. That means any time you have something like .* it will try to match as many characters as it possibly can. This can be a problem when you're trying to match content between an opening pattern and a closing pattern, if that pattern might appear multiple times. However, you can use ? to make a regular expression not greedy.

Try using a regular expression like this to match those HTML tags you're looking for:

/<p style="flex: 0 0 auto; margin:0; padding-right:1rem;">.*?<\/p>/g

How to remove HTML code from mulitple pages using regular expressions from Notepad++

1 Answers1