-1

I have this following regex:

/<(?:textarea|select)[\s\S]*?>[\s\S]*?(\{\{\{variable:(.+?)\}\}\})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]+?(value=[\s\S]+?)(\{\{\{variable:(.+?)\}\}\})[\s\S]+?>|(\{\{\{variable:(.+?)\}\}\})/im

And this (shortened) HTML document:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Test</title>
</head>
<body>
    <section id="about">
        <div class="container about-container">
            <div class="row">
                <div class="col-md-12">
                    {{{block:welcome-intro}}}
                </div>
            </div>
        </div>
    </section>
    <section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testinput">Name<span class="form-validation-required"> * </span></label>

                    </div>
                <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testpassword">Password</label>

                    </div>
                <div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testtextarea">Biography</label>
                <span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
                    </div>
                <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}

{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testsummernote">Interests</label>
                <span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
                    </div>
                <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
        </form>                </div>
            </div>
        </div>
    </section>
</body>
</html>

Parsing above HTML document to find {{{variable:whatever}}} yields this result:

Array
(
    [0] => Array
        (
            [0] => {{{variable:system_version}}}
            [1] => {{{variable:contact-email-address}}}
            [2] => <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
                   <div class="row"><div class="col-sm-12 form-error"></div></div>
                   <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                   <div class="control-label"><label for="testinput">Name<span class="form-validation-required"> * </span></label></div>
                   <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div>
                   <input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}">
            [3] => <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}} {{{variable:system_login}}}</textarea>
            [4] => <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea>
        )
)
  • Indices [0] and [1] are correct, as they do not appear within a select/textarea/input tag.
  • Indices [3] and [4] are correct, because they are only encapsulated by one select/textarea/input tag.

I am learning regexes and still do not understand all the concepts, but I am getting better, so please excuse if my terminology is wrong, but it does appear that it does a greedy match of some sort. I am expecting to only see <input id="testinput"...{{{variable:...}}}"> at index [2].

The end goal is to only replace these placeholders with different data if they are not inside a textarea/select/input.

Why would index [2] match so many elements, and how can this be fixed?

Kobus Myburgh
  • 1,114
  • 1
  • 17
  • 46
  • 2
    Agreed. Parse the dom first, then call upon regex. You need to filter out the unwanted elements with a tool that is DOM-aware -- regex is not DOM-aware. – mickmackusa Jul 25 '19 at 22:47
  • 1
    Hi again Emma. Thanks for responding. I am really struggling as you can see. I accepted your previous answer, as your efforts were working, even though I could not piece it together in code with two expressions to make it function. I ended up with one expression and a `str_replace()` to get this far. I would've done it with two expressions if I could get it to work. I spent hours trying to make it work that way. – Kobus Myburgh Jul 25 '19 at 22:48
  • How would my effort look like with two expressions? Here is my process: (1) I grab the output from CodeIgniter before spitting it out to the browser (2) I find the placeholders (3) I do a foreach, and if it contains input/textarea/select, I replace the placeholder temporarily with something else (4) Process remaining ones to do the replace, then (5) change the temporary things back to the original. For me it sounds logical that this should work, and I understand why you say it doesn't, but I don't know how to fix it :-( – Kobus Myburgh Jul 25 '19 at 22:53
  • @Emma you have incorrectly advised the use of the `m` pattern modifier in the previous question. – mickmackusa Jul 25 '19 at 22:53
  • @mickmackusa, with or without `m`, `i` or `im` I get incorrect results. – Kobus Myburgh Jul 25 '19 at 22:59
  • 1
    @Kobus Emma answers frequently, and people are copy-pasting the stuff that is posted on Stack Overflow. It is important to improve Emma's answers so that she isn't actively misleading or teaching researchers bad habits. There is a big difference between "answers that work" and "answers that will not fail". Furthermore, the slashes on the curly braces are not necessary and contribute to pattern bloat: https://regex101.com/r/MYmqw0/1 I am at work now and cannot volunteer further at this time. – mickmackusa Jul 25 '19 at 23:00
  • @Emma, I could use the DOM, I see no reason why not to, but since your previous solution seemed to work, I never looked into that again. – Kobus Myburgh Jul 25 '19 at 23:01
  • @mickmackusa fully understood. I just informed that I tried all the flags (including `g` which causes PHP errors) to try and save going over those again. – Kobus Myburgh Jul 25 '19 at 23:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197011/discussion-between-kobus-myburgh-and-mickmackusa). – Kobus Myburgh Jul 25 '19 at 23:04
  • 1
    I have posted a _proper_ answer to this question, on [your previous question](https://stackoverflow.com/a/57219765/2943403) which better describes what you require anyhow. – mickmackusa Jul 26 '19 at 12:10

1 Answers1

0

It's frowned upon, yet I'm guessing that maybe this expression might be slightly closer to what you may have in mind, not so sure though:

<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})

It can be also improved, for instance escapings are unnecessary:

<(?:textarea|select).*?>.*?({{{variable:(.*?)}}}).*?</(?:textarea|select)>|<(?:input).+?(value=.*?)({{{variable:(.+?)}}})?.*?>|({{{variable:(.*?)}}}) 

Here, we'd be trying to add an optional group for our input elements, so that it would distinguish between those with and without the existing vars.

Demo

Test

$re = '/<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})/si';
$str = '<section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);
Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 2
    See my comment about excess slashes in the pattern. – mickmackusa Jul 26 '19 at 03:25
  • 1
    @mickmackusa, thanks. I take note of your escaping comment, and will check your modified regex shortly. – Kobus Myburgh Jul 26 '19 at 07:15
  • 1
    Thanks for everything. @Emma, your solution works, thank you. I am taking note of everyone's comments that regex is not the best option for parsing HTML, and I may change it, but for now, with my question requesting regex help, this answer is the correct one. – Kobus Myburgh Jul 27 '19 at 17:36