0

I'm trying to remove a script that contains a malware from my database.
It was injected in a lot of registers of my table.

The script starts with a <script> tag and ends with a </script> tag.

I'm using the following code to find and replace it:

$content = $post->post_content;
$new_content = preg_replace('/(<script>.+?)+(<\/script>)/i', '', $content);

I've tested it on regx101.com and it's working fine but on my code, it doesn't work.

Does anyone know what's wrong?

fackz
  • 531
  • 2
  • 6
  • 12
  • 4
    You should *never* parse HTML with regex. Use [a PHP DOM parser](http://simplehtmldom.sourceforge.net/) instead. – Jay Blanchard Mar 13 '20 at 13:43
  • 1
    `.` does not extend to new lines unless `s` modifier is used. You should not use regex for HTML though. Your groupings and the `+` on the opening script also are a bit strange. – user3783243 Mar 13 '20 at 13:48
  • 1
    "Now you have two problems." –  Mar 13 '20 at 14:36

1 Answers1

0

Here is my goto regex for <script>...</script> tags with their contents:

(\<script\>)([\s\S]*?)(<\/script>)

You're not escaping some key characters and you're not capturing everything which could be in the contents of the tags.

Here is an explanation of the content capturing group:

\s matches any whitespace character
\S matches any non-whitespace character
*? matches between zero and unlimited times, as few times as possible, expanding as needed

As I stated before, you really shouldn't do this. You should use a PHP DOM parser instead.

Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141
Jay Blanchard
  • 34,243
  • 16
  • 77
  • 119