My regex function seems to be right but it doesn't work

Question

I'm trying to remove a script that contains a malware from my database.
It was injected in a lot of registers of my table.

The script starts with a <script> tag and ends with a </script> tag.

I'm using the following code to find and replace it:

$content = $post->post_content;
$new_content = preg_replace('/(<script>.+?)+(<\/script>)/i', '', $content);

I've tested it on regx101.com and it's working fine but on my code, it doesn't work.

Does anyone know what's wrong?

You should *never* parse HTML with regex. Use [a PHP DOM parser](http://simplehtmldom.sourceforge.net/) instead. — Jay Blanchard, Mar 13 '20 at 13:43
`.` does not extend to new lines unless `s` modifier is used. You should not use regex for HTML though. Your groupings and the `+` on the opening script also are a bit strange. — user3783243, Mar 13 '20 at 13:48

score 0 · Answer 1 · edited Mar 13 '20 at 14:28

Here is my goto regex for <script>...</script> tags with their contents:

(\<script\>)([\s\S]*?)(<\/script>)

You're not escaping some key characters and you're not capturing everything which could be in the contents of the tags.

Here is an explanation of the content capturing group:

\s matches any whitespace character
\S matches any non-whitespace character
*? matches between zero and unlimited times, as few times as possible, expanding as needed

As I stated before, you really shouldn't do this. You should use a PHP DOM parser instead.

My regex function seems to be right but it doesn't work

1 Answers1