0

The following regex catch all HTML style tags:

[^noscript\>]<style[^>]*>([^<]+)?<[\s\/]+style>

The first part [^noscript>] is used to ignore any style tag wrapped by a noscript tag.

The problem is, the pattern appear to return an unwanted left side char, how to avoid that? See this example https://regex101.com/r/aA6ihs/1/

WP-Silver
  • 170
  • 7
  • 1
    I will just mention https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Nigel Ren Jun 08 '19 at 06:59
  • Thanks for your answer, but i have no other option here ! – WP-Silver Jun 08 '19 at 07:01
  • If you can apply a regular expression to some HTML text, you can also parse it properly. So, yes, you do have options. – Peter Jun 08 '19 at 07:03
  • @Peter i had tried DOMDocument() it doens't work correctly. It simply modify the parsed html which definitely, is not something to consider here. – WP-Silver Jun 08 '19 at 07:05
  • 1
    I know this means extra work, but may I suggest adding another question which is how you tried to solve it in DOMDocument and what failed and perhaps we can help you sort the problem out that way? – Nigel Ren Jun 08 '19 at 07:06

2 Answers2

1

While this would be better to do with an HTML parser, you can skip over all the <noscript> tags with (*SKIP)(*FAIL) - try to match <noscript>...</noscript>, and if it gets matched, fail the pattern at the end, and continue searching for matches after the end:

<noscript>.*?<\/noscript>(*SKIP)(*FAIL)|<style[^>]*>([^<]+)?<[\s\/]+style>

https://regex101.com/r/aA6ihs/3

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
0

Here, we would simply capture noscript tags, add an if statement to ignore those, then we would be retuning our desired output with a simple expression such as:

(<noscript>)[\s\S]+?<\/noscript>|<style(.+?)>(.+?)<\/style>

Demo

Test

$re = '/(<noscript>)[\s\S]+?<\/noscript>|<style(.+?)>(.+?)<\/style>/mi';
$str = '<!DOCTYPE html>
<html lang="en-US">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" />
        <style type="text/css"></style>
<noscript><style>

< / style></noscript>
                    <!-- Twitter Cards Meta by USM  STARTS-->
                <meta name="twitter:card" content="summary" />


        <style type="text/css">.recentcomments a{display:inline !important;padding:0 !important;margin:0 !important;}</style>

<link rel="pingback" href="/xmlrpc.php">
<noscript><style>

< / style></noscript>
        ';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $key => $value) {
    if ($value[1] != '<noscript>') {
        echo $value[3];
    }
}

Output

.recentcomments a{display:inline !important;padding:0 !important;margin:0 !important;}
Emma
  • 27,428
  • 11
  • 44
  • 69