0

My [php executed] regex is terrible and I'm struggling with trying to isolate javascript scripting within HTML blocks. I have the following regex that works partially, but it's run into a problem if there's the word "on" in the text (as opposed to in a < tag >).

$regex = "/<script.*?>.*?<\/script.*?>(*SKIP)(*F)|((\\bon(.*?=)(.*?))(\'|\")(.*?)(\\5))/ism";

$html = preg_replace_callback($regex,
           function ($matches) {
               $mJS = $matches[2] . $matches[5] . myFunction($matches[6]) . $matches[5];
               return $mJS;
           },
           $html);

I think the issue is that the \bon.... part needs to be qualified to be inside a < tag > before being considered, but I just don't know how.

Running the following test...

$html= "<div id='content' onClick='abc()'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>abc();</script>";

Returns...

<div id='content' onClick='****abc()****'>Lorem On='****abc****' ipsum on to</div>
<input id='****a****' type='range'>
<input id='b' type='range'>
<script>abc();</script>

but I wanted...

<div id='content' onClick='****abc()****'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>****abc();****</script>

I have a sandbox running this if you want to have a play: https://onlinephp.io/c/a43b1

Does anyone have any suggestions?

user1432181
  • 918
  • 1
  • 9
  • 24
  • You skip the ``. Doing hard to understand, can you clarifiy or recheck your desired output? – bobble bubble Nov 14 '22 at 13:44
  • Btw. does not look like you need a callback, have a try with [this PHP demo at tio.run](https://tio.run/##ZVBda4MwFH33VwQRbpKWSfdqog@FsdHBhD027dA01FBNgqb7AP@7y3QbjF24F879Ovce17hpYkV5X0ZR0quzekccxSkbZK@dF/X@mLMDzW9owUS6JHNMn3cPJcH0joyitka8rcRAeXCxw8WIRUzw/ijiw4qIeMTwhSAAIKke4iwQNb5rAw076VekTxykNV4ZD8iabavlhUNVS0wgf7S96tCTmROAtBuuXWhC3rI0DOcR08Zd/bykAuQ/nOLQV@as4E@t/lf7/mXmydjPa7/XBRVckOOlV66tpMKLNmsEyYYGS27nuIE1mttJmFOysQvKpukT) - Regex [explained at regex101](https://regex101.com/r/alHnmC/1). Guessing yet that's what intended. – bobble bubble Nov 14 '22 at 13:56
  • Thanks BB - I didn't mean to skip ....yes I DID want ; I think I do need the callback as I actually need to call another PHP function once I've got the code isolated ( I've adjusted the code sample above to show this now) – user1432181 Nov 14 '22 at 14:15
  • 1
    Hmm, why use `(*SKIP)(*F)` then? Have a look at [this regex101 demo](https://regex101.com/r/alHnmC/2). – bobble bubble Nov 14 '22 at 14:49
  • Thanks BB - I think that's working for me. I've placed a working php on https://onlinephp.io/c/a249d. – user1432181 Nov 14 '22 at 15:06

1 Answers1

1

With help from Bobble Bubble, I've been able to get this working...

((Edit Note (Jan'23) - the following is a revised version of the answer which had previously not taken into account of escaped or .replace(/'/g problems):

<?php

const regex = <<<'PATTERN'
/(<script\b[^><]*>)(.*?)(<\/script>)|\bon\w+\s*=\s*\K(?|(')([^'\\]*(?:(?:\\.|'(?=[^)(]*\)))[^'\\]*)*)'|(")([^"\\]*(?:(?:\\.|"(?=[^)(]*\)))[^"\\]*)*)")/ism
PATTERN;

const html=<<<'PATTERN'
<div id='content' onClick='abc()'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>abc();</script>";

<div id='content'
         onClick='yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej")'
    >Lorem On='abc' ipsum on to</div>


    <input id='a' type='range'
           onPress="xxx(document.getElementById(\"abc\"))"
           onSomething="yyy(\'fehrje\')"
           onSomethingElse="document.getElementById('content').innerHTML.replace(/"/g, \"dq\")">
    <input id='b' type='range'>

PATTERN;

function myFunction($tx) {
    return "****$tx****";
}


$regex = regex;
$html  = html;

$result = preg_replace_callback($regex,
        function ($matches)  {
            if ( isset($matches[1])) $m1=$matches[1]; else $m1="";
            if ( isset($matches[2])) $m2=$matches[2]; else $m2="";
            if ( isset($matches[3])) $m3=$matches[3]; else $m3="";
            if ( isset($matches[4])) $m4=$matches[4]; else $m4="";
            if ( isset($matches[5])) $m5=$matches[5]; else $m5="";
            $mJS = $m1.$m4 . myFunction($m2.$m5) .$m3.$m4;
            return $mJS;
        },$html);


echo "Result=$result";
echo "\n\n";
?>

See https://onlinephp.io/c/ca781 for a running executable.

user1432181
  • 918
  • 1
  • 9
  • 24
  • [Another demo ↗](https://tio.run/##ZVHfb5swEH7nr7ihSLbTKKjb@hQgqvYyaZNWaY@BVmAcsAq2Z5ut07K/PT1M6FYVyeh@fXfffWc6cz6n@7vPd1G0sqIVT5ABSWjquJXGF/XhPk/Ldc7odr1nNC2SOZGzU1FrVfy6Ktw6w1d8ofsTjRk93MflFYtPtCCTUxD0JnOfLa0S6cgOx3V@6DOI00b@BNlkhGvlhfIEtPrUS/6YkarmlJH8q7ZigG8qBAhI48YBi8DrNEFwHqVSmdGHJhUB/9uIjNhKtYK8ytVvcpdlwpxduuwWL@xQC4OiPFhh@oqLB171fV3xRzpLtYHjqLiXWtHVwKI/EUCSgFTSQ2v1aGCoPO@EA3mEUTnhseCIu1S8Axo40OvNDYPb77CSzRPDOvpOOizEfocpVDIGi4lkYmQGMI9ZuG8ABbMQQ0isenH0WImg63KL/4/lP4hUTjYCZuAE@jFqL9wEu2g/I98H5M1/yBdplmkTxsq2uyA@vJnVaHB6EL6TqoVKNWCFHy3erBOoWt/P2mDtJR6Ib8kaP7Jd6Lz4YRRe5e8GwmEY2oJ3evZ25/Mz) – bobble bubble Nov 14 '22 at 15:19
  • 1
    I've just played with above demo (link too long for writing more), great you got it going! :) – bobble bubble Nov 14 '22 at 15:20
  • Hi, I have run into a new problem with this solution - in that escaped quotes are not being handled. I have updated the test with https://regex101.com/r/sRNTVI/2 but is there an easy fix to the regex to have the portions it finds to continue until their corresponding quotes? (e.g. `onClick='yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej")'` should find `yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej")` ?) (also note the additional problem of if the portion includes a `.replace(/"` type situation that isn't escaped but also needs to be handled). – user1432181 Jan 02 '23 at 12:30
  • 1
    You would need to use a [pattern that can deal with escaped quotes](https://stackoverflow.com/a/10786066/5527985): `$regex = '/( – bobble bubble Jan 02 '23 at 13:01
  • 1
    Your second requirement with the `.replace(` is more difficult. You can try to treat those quotes inside `(`...`)` like the escaped ones? For a very experimental idea see [this updated demo](https://regex101.com/r/0nGY5J/3): `$regex = '/( – bobble bubble Jan 02 '23 at 13:16
  • Excellent... again. Thank you BB. – user1432181 Jan 02 '23 at 14:13