-1

I have created a regular expression pattern that allows me to extract inline styles from any HTML string: https://regex101.com/r/YBdbxS/4 - to better understand the context, I have created a custom WordPress plugin that extracts inline styles from the current page and puts it into an own file.

To make the regular expression work again, just remove one #myid { color: green; } in the snippet above and it works as expected.

The regular expression worked really well, but now I get a catastrophical backtracing error cause the inline style is too long (the WordPress theme I use is using a very long inline style so I am not able to touch the inline style itself; e.g. split it into multiple styles).

Temporary solution

I have found a workaround for this by setting pcre.backtrack_limit to a value like

ini_set("pcre.backtrack_limit", "23001337");

But this seems to be bad practice.

Is there any alternative e.g. anchor the pattern, so it does not backtrace again and again from the first found match in the (.*?) group?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Matthias Günter
  • 617
  • 8
  • 24
  • 3
    Yes, the alternative is to use a DOM parser to parse (X)HTML. Do not use regex. You may play with ` – Wiktor Stribiżew May 07 '21 at 13:00

1 Answers1

0

As mentioned by Wiktor, the answer is to use a negative look ahead ?! with a non-capturing group ?:.

<style([^>]*)>([^<]*(?:<(?!\/style>)[^<]*)*)<\/style>
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^|       |  *1
                                            ^^^^^^^^^  *2
  1. Capture all the content before </style>. Use the same content as your "closing" pattern of *2
  2. Your "closing" pattern
Matthias Günter
  • 617
  • 8
  • 24