1

I need the help of some RegEx-Experts to fix a bug in a WordPress-Plugin, which is no longer maintained by the author.

Inside the plugin there is the following php-sytax to find included scripts:

'/(\\s*)(<script\\b[^>]*?>)([\\s\\S]*?)<\\/script>(\\s*)/i'

This line filters scripts no matter for what media they are written. To fix an bug this line must be changed, so that script tags with the parameter media="print" are not extracted.

How must this line be chanced that script tags with parameter media="print" are not affected?

See here for the topic in the WordPress-Support-Forum.

GeroB
  • 43
  • 5
  • 1
    I'd add a test after the regex to check the captured `( – jswolf19 Mar 08 '11 at 13:02
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – alimbada Mar 08 '11 at 13:06
  • The double escaping of backslashes is unneeded. Also, is the regex used with a simple preg_replace call, or preg_replace_callback? (where jswolf19`s advise would be easiest to implement). – mario Mar 08 '11 at 13:08
  • The whole passage is: `$this->_html = preg_replace_callback( '/(\\s*)( – GeroB Mar 08 '11 at 21:23

3 Answers3

2

preg are not meant to match HTML tags. You'll never know where and how attributes will be defined :

<script media="print">
<script media=print>
<script type="text/javascript" media="print">
<script media="print" type="text/javascript">

Basically, you cannot handle that in a good way with pregs. I'd suggest you to extract the html you want to clean into some DOM (or even SimpleXML) object and get all script tags where attributes are "print" with an xpath function

//script[media="print"]
tsadiq
  • 402
  • 3
  • 8
  • While I would normally agree with you, I'm assuming GeroB doesn't want to rewrite the entire plugin to get rid of regexes, and what you suggest is likely a complete overhaul. – jswolf19 Mar 08 '11 at 13:44
  • Yes, Tsadiq, I also agree with you, but jswolf19 is right: I just would like to modify this line. I don't want and can't rewrite the whole plugin. – GeroB Mar 08 '11 at 21:01
  • But another thing makes it easier: the modification is only necessary not to filter out the code for the admin bar, which is always with the parameter media="print" (and not 'print" or other possibilities). The line is always: – GeroB Mar 08 '11 at 21:15
0

A pretty simple approach would be:

'#<script\b(?:\s+(?!media="print")[^\s>]+)*\s*>(.*?)</script>#i'

It uses a (?!..) negative assertion to look at each string part after a space. This will not exactly match HTML attributes, but is sufficient to detect the single case. You might need to add alternatives though (media=print or media='print') because preg_match is looking for raw strings, not interpreting HTML-equivalent expressions. (Using DOM however would certainly be overkill for this task.)

mario
  • 144,265
  • 20
  • 237
  • 291
  • It should be ok with script tags, but there may be attributes with values including space characters. – jswolf19 Mar 08 '11 at 13:45
  • @jswolf19: There might be, but it wouldn't matter for this regex since it only needs to blacklist and doesn't need to care about syntactic correctness. For example ` – mario Mar 08 '11 at 13:51
  • @mario, that's true. Of course, it brings up the question that @GeroB didn't specify: what about a script with multiple medias? e.g. media="print, screen" – jswolf19 Mar 08 '11 at 13:55
  • And then again, what if the attribute is surrounded by no quotes like `media=print`or the odd (but i already seen) `media='print'` ?! – tsadiq Mar 08 '11 at 13:57
  • @jswolf19: Sounded more like it was a specific single instance he wanted to filter. (The real question is, why isn't WP configurable to omit any unneeded script areas except that admin thingy bar.) – mario Mar 08 '11 at 13:57
  • @Tsadiq: Already mentioned that with " You might need to add alternatives though (media=print or media='print') ..." – mario Mar 08 '11 at 13:58
  • This line does not work. I think it changes the original "surrounding" of the line with a (\\s*)( before the – GeroB Mar 08 '11 at 21:18
  • @GeroB: Well yes. You'll have to add the capture groups or `\s*` matches again if they are required. (No mention of that in your question.) – mario Mar 08 '11 at 21:23
-1

to remove tags use strip_tag according to your need

xkeshav
  • 53,360
  • 44
  • 177
  • 245
  • and how would you make `strip_tags` consider attributes? Also, `strip_tags` is using a whitelist to allow tags which means the OP would have to list all other HTML elements if the aim is just to remove script elements. – Gordon Mar 08 '11 at 13:10