0

I want to parse option to uploadify method in that structure:

<script>
$(function() {
  $("#file_upload_1").uploadify({
    uploader : '/uploadify/uploadify.php',
  });
});
</script>

I'm able to extract everything between script tag using:

$matches = array(); reg_match('/.*<script>(.*)<script>/s', $s, $matches);

but don't know how to move futher. I need to remove everything in $matches[1] before "uploadify({" (because it is keyword), and after first occurunce of "});"

3 Answers3

0

There seems to be an escaped slash missing from your pattern:

/.*<script>(.*)<\/script>/s

Changing the delimiters would be easier, though:

~.*<script>(.*)</script>~s

Note that this will get you the last <script> pair in your input (.* will consume as much as possible, pushing the two tags as far back as possible). If that's what you want, fair enough.

In any case, it would be better to obtain the contents of the script tag by using a DOM parser. It's more robust, more readable, and whatnot. Here is an overview of your options to do that using PHP.

Now for your actual question. Again, a JavaScript or JSON parser might help, but using regular expressions, you could use a non-greedy repetition, to make sure your match only goes up to the first });:

/uplodify[(][{](.*?)[}][)];/s

The main problem with this approach is that }); could occur inside a string or comment, or could even be the end of a nested anonymous function. Although PCRE provides the recursion (?R) construction in regular expressions, trying to use it to parse JavaScript is bound to melt your brain (and that of anyone trying to understand the code in the future). That's why (for a robust solution) this should really be tackled using some kind of JavaScript parser.

You might even be better off, just looking for uploadify and then going through the rest of the string character-by-character, counting the different kinds of nesting levels, JavaScript has, to make sure you are looking for the right });.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
0

You seem to have missed p in your regex function as php provides preg_match. Nonetheless you can use following regex to match uploadify method;

preg_match('/<script>.*?(uploadify\({.*?}\);).*<\/script>/s', $s, $matches);
techstunts
  • 92
  • 5
0

Since you really want options, a JS object, here is a deathstar-sized overkill solution to parse such things natively. Idea here is to extract the part you want, housed between the { } JSON object in JS, and then evaluate it into a usable structure with PHP's json_decode. In this code, I stored the HTML fragment into $variable0.

// expression broken down for readability
$frag = array(
    "/<script>",
    ".*?",             # whitespace
    "uploadify\(",
        "(.*?)",       # our desired match
    "\);",             # closest )
    "(.*)",            # more whitespace we don't want
    "<\/script>/s"
);

// flatten expression into match string
$expr = implode("", $frag );

So at this point, $expr is /<script>.*?uploadify\((.*?)\);(.*)<\/script>/s

$m = preg_match( $expr, $variable0, $r );

Now $r should be an array, of which $r[1] contains that "{... }" snippet. This can be evaluated with json_decode, however, the string is malformed for json_decode to use. For one, the keys have to be enclosed in quotes (ie: uploader: '' should be 'uploader': '') in javascript. Literally, $r[1] looks like this:

{
    uploader : '/uploadify/uploadify.php',
  }

Another person came up with a cleaning function that we can apply here.

// fix thanks to http://stackoverflow.com/a/14075951/415324
function fix_json( $a ) {
    $a = preg_replace('/(,|\{)[ \t\n]*(\w+)[ ]*:[ ]*/','$1"$2":',$a);
    $a = preg_replace(
      '/":\'?([^\[\]\{\}]*?)\'?[ \n\t]*(,"|\}$|\]$|\}\]|\]\}|\}|\])/','":"$1"$2',
    $a);

    return( $a );
}

// $r[1] will contain innards of uploadify(), which is JSON
$json = fix_json( $r[1] );

This turns $json into something that PHP can parse natively. $json now looks like:

{"uploader":"/uploadify/uploadify.php',"}

Note that there is a trailing comma up there. That's a javascript error in the original HTML you're extracting, and it needs to be fixed on the site. More on that below.

$options = json_decode( $json );

And at this point, we have a object we can use in PHP

var_dump( $options );

object(stdClass)#2 (1) {
  ["uploader"]=>
  string(24) "/uploadify/uploadify.php"
}

Thus, you can easily access any additional options you encounter, with echo $options->uploader;

NOTE: There is a problem with original HTML --- it contains a trailing comma that breaks javascript parsing in some browsers. Think FireFox will cut it some slack, but certainly not IE. To fix the JS, remove the trailing comma in the options object:

$("#file_upload_1").uploadify({
  uploader : '/uploadify/uploadify.php'
});
pp19dd
  • 3,625
  • 2
  • 16
  • 21
  • Great idea, but fix_json( $r[1] ) returns the same string - { uploader : "/uploadify/uploadify.php" } - what can cause that? – Alega Agafonov Apr 28 '13 at 17:44
  • It shouldn't - I posted the actual return I get after "$json now looks like". – pp19dd Apr 28 '13 at 18:02
  • found it! braces should be removed – Alega Agafonov Apr 28 '13 at 18:07
  • Hi again, actually situation is quite strange for me, after i've done preg_match( "/ – Alega Agafonov Apr 28 '13 at 20:50
  • Check whether that server escapes strings automatically from ' to \', etc. Sometimes you have to take a post/get input and filter it, like $input = stripslashes( $_GET['code'] ); – pp19dd Apr 28 '13 at 22:11
  • now it is fine, all i needed was replacing of "\r\n" ant "\t" - thanks for good working idea! – Alega Agafonov Apr 29 '13 at 07:30