0

The php pattern regex syntax stuff really gives me headaches... I'm trying to match all javascript tags except the js tags with id="pagespeed", so that I can move them somewhere else. All I need is the pattern condition, everything else is done.

I'm having this:

  $jsPattern = '#<script.*</script>#isUm';

which finds all tags, and now I need to check that the exceptional condition is not true. Should be something like:

 ~^<script.+id=\"pagespeed\".*</script>]~

The line is probably wrong and needs to be combined with the line above. Would be great if someone could help me as I seem to suck at this PCRE syntax :(

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
Daniel
  • 13
  • 3
  • Using DOMXPath will be more simple to do that. – Casimir et Hippolyte May 31 '15 at 12:44
  • 1
    I wouldn't use a regex for this. Regex for HTML [has problems](http://blog.codinghorror.com/parsing-html-the-cthulhu-way/) due to HTML(and XML) being a nested language. Consider using [DomDocument](http://stackoverflow.com/a/774853/2370483) instead – Machavity May 31 '15 at 12:44
  • 1
    @Machavity: PCRE **is able** to parse nested structures. – Casimir et Hippolyte May 31 '15 at 12:45
  • @CasimiretHippolyte I never said it couldn't. I said it has problems. In this case, the OP would be better off with a DomDocument solution that would do the same thing without any issues. – Machavity May 31 '15 at 12:50

2 Answers2

2

Since you are dealing with structured datas, the more simple way is to use the structure and to query it instead of using a text approach. In addition this approach will prevent you from falling into the many traps that can contain html code.

$dom = new DOMDocument;
$dom->loadHTML($html);

$xp = new DOMXPath($dom);

$scriptNodeList = $xp->query('//script[not(@id="pagespeed")]');

foreach ($scriptNodeList as $scriptNode) {
    $scriptNode->parentNode->removeChild($scriptNode);
}

echo $dom->saveHTML();
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thank you for your help... this would be the better solution, but I was just trying to quickly fix an already existing regex syntax in a bigger framework. So I have to go with Karthik's solution, sorry – Daniel Jun 01 '15 at 13:13
0

I would use @Casimir's answer for this purpose.. If you are looking for a regex.. use the below pattern:

<script[^>]*id="(?!pagespeed\b)[^"]+".*<\/script>

See DEMO

karthik manchala
  • 13,492
  • 1
  • 31
  • 55