1

I should firstly apologize for my probably rookie question, but I've just got no clue how to achieve that relatively complex task being a complete newbie regarding regex. What I need is to specify a validation pattern for a string input and perform separate checks on the separate segments of that pattern. So let's begin with the task itself. I'm working with php7.0 on laravel 5.4 (which should genuinely not make any difference) and I need to somehow produce a matching pattern for a string input, which pattern is the following:

header1: expression1; header2: expression2; header3: expression3 //etc...

What I'd need here is to check if each header is present and if it's present in a special validation list of available headers. So I'd need to extract each header.

Furthermore the expressions are built as follows

expression1 = (a1 + a2)*(a3-a1)
expression2 = b1*(b2 - b3)/b4
//etc...

The point is that each expression contains some numeric parameters which should form a valid arithmetic calculation. Those parameters should also be contained in a special list of available parameter placeholders, so I'd need to check them too. So, is there a simple efficient way (using regex and string analysis in pure php) to specify that strict structure or should I do everything step by step with exploding and try-catching?

An optimal solution would be a shorthand logic (or regex expression?) of a kind like:

$value->match("^n(header: expression)")
->delimitedBy(';')
->where(in_array($header, $allowed_headers))
->where(strtr($expression, array_fill_keys($available_param_placeholders, 0))->isValidArithmeticExpression())

I hope you can follow my logic. The code above would read as: Match N repetitions of the pattern "header: expression", delimited by ';', where 'header' (given that $header is its value) is in an array and where 'expression' (given that $expression is its value) forms a valid arithmetic expression when all available parameter placeholders have been replaced by 0. That's it all. Each deviation of that strict pattern should return false.

As an alternative I'm currently thinking of something like firstly exploding the string by the main delimiter (the semicolon) and then analysing each part separately. So I'll then have to check if there is a colon present, then if everything to the left of the colon matches a valid header name and if everythin to the right of the column forms a valid arithmetic expression when all param names from the list are replaced by a random value (like 0, just to check if the code executes, which I also don't know how to do). Anyway, that way seems like an overkill and I'm sure there should be a smoother way to specify the needed pattern.

I hope I've explained everything good enough and sorry if I'm being to messy explaining my problem. Thanks in advance for each piece of advice/help! Greatly appreciated!

D. Petrov
  • 1,147
  • 15
  • 27
  • 1
    My first instinct is to break it into steps. Don't try to do it all in one line. What is this, perl? So maybe explode on the delimiter, and get all the headers in an array. Then check each of the headers using a foreach. Etc. That's my general approach. – Will Hines Dec 30 '17 at 02:56
  • the hardest part would be parsing out the parameters. Am I right in guessing that's the toughest part? – Will Hines Dec 30 '17 at 02:58
  • You're absolutely correct. I've proceeded so far with what I've been thinking of, but it's so tough to check if all the requirements are generally fulfilled. And yes, analysing the arithmetic expression is for sure the toughest part. – D. Petrov Dec 30 '17 at 03:01
  • The biggest problem here is that I can't handle eventual parse errors when eval()'ing the `expression`. What could be a good alternative to check if that string actually returns a numeric value when executed? – D. Petrov Dec 30 '17 at 03:23
  • ah. i'm just now grasping that you need to evaluate the expressions if numbers get plugged in. Yeah, that's a bit tricky. I can only think of the eval() expression which I am scared to use! – Will Hines Dec 30 '17 at 03:34

3 Answers3

1

Using eval() must always be Plan Z. With my understanding of your input string, this method may sufficiently validate the headers and expressions (if not, I think it should sufficiently sanitize the string for arithmetic parsing). I don't code in Laravel, so if this can be converted to Laravel syntax I'll leave that job for you.

Code: (Demo)

$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: c1 * (((c2); header4: ((a1 * (a2 - b1))/(a3-a1))+b2";
$allowed_headers=['header1','header3','header4'];

$pairs=explode('; ',$test);
foreach($pairs as $pair){
    list($header,$expression)=explode(': ',$pair,2);
    if(!in_array($header,$allowed_headers)){
        echo "$header is not permitted.";
    }elseif(!preg_match('~^((?:[-+*/ ]+|[a-z]\d+|\((?1)\))*)$~',$expression)){  // based on https://stackoverflow.com/a/562729/2943403
        echo "Invalid expression @ $header: $expression";
    }else{
        echo "$header passed.";
    }
    echo "\n---\n";
}

Output:

header1 passed.
---
header2 is not permitted.
---
Invalid expression @ header3: c1 * (((c2)
---
header4 passed.
---

I will admit the above pattern will match (+ )( +) so it is not the breast best pattern. So perhaps your question may be a candidate for using eval(). Although you may want to consider/research some of the github creations / plugins / parsers that can parse/tokenize an arithmetic expressions first.

Perhaps:

Any $pair that gets past the if and the elseif can move onto the evaluation process in the else.

I'll give you a headstart/hint about some general handling, but I'll shy away from giving any direct instruction to avoid the wrath of a certain population of critics.

}else{
    // replace all variables with 0
    //$expression=preg_replace('/[a-z]\d+/','0',$expression);
    // or replace each unique variable with a whole number
    $expression=preg_match_all('/[a-z]\d+/',$expression,$out)?strtr($expression,array_flip($out[0])):$expression;  // variables become incremented whole numbers
    // ... from here use $expression with eval() in a style/intent of your choosing.
    // ... set a battery of try and catch statements to handle unsavory outcomes.
    // https://www.sitepoint.com/a-crash-course-of-changes-to-exception-handling-in-php-7/
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • @D.Petrov If this solution doesn't cover all of your cases, please update your question to include the fringe cases and leave me a comment. I'll see if I can adjust my method to suit. – mickmackusa Jan 03 '18 at 21:47
  • Really really appreaciated and thanks a lot for the help! I'm going to try it out as soon as possible, but it really seems like exactly the type of solution I'd need! But as you say, `eval()` isn't really the best opportunity, but is there going to be an alternative way to actually replace the parameters with their numeric value and retrieve the result after the calculations done? Thanks a lot again! – D. Petrov Jan 04 '18 at 03:30
  • @D.Petrov I extended my answer a little for you. I hope this helps. – mickmackusa Jan 04 '18 at 11:50
  • Hey, I really appreciate your help! I'm currently waiting on the way home, sitting in trafic jam, but I was working on an extended regex pattern earlier on. I will share the final outcome with you, as it is based on your answer as a headstart! The evaluation will be the easiest part after all the data has been optimally verified and parsed :) – D. Petrov Jan 04 '18 at 11:54
  • I'll be happy to add my 2 cents about your new pattern. – mickmackusa Jan 04 '18 at 11:56
  • ...before I go to bed, here is something to fool around / test with: http://sandbox.onlinephpfunctions.com/code/4bb64a116a7ffa23fbd382e4eafcf702d64ffd93 – mickmackusa Jan 04 '18 at 12:21
  • Here's the expression I finally came up with by now. Haven't tested each possible mistake scenario but so far it covers the whole princip of the pattern good enough. Thanks again, I'll be waiting for comments/recommendations in case you have any. :) `^(\w+:([-+]?(?'expr'(\w+|[-+]?\(\g\))([-+*\/]\g)?)))+(;\g<1>)*$` – D. Petrov Jan 04 '18 at 19:23
  • Would you mind applying your pattern to a regex101.com demo and sending me the link that proves your pattern successful on a particular input string? – mickmackusa Jan 05 '18 at 13:36
  • As of right now I've even replaced `\w+` with `[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*` to match either valid php variable names only (as I've decided I wouldn't want to accept e.g. '1param' as a valid param name, but only 'param1' and so on) or a coresponding numeric value, for which I've added `|\d+(\.\d+)?` as a third option. You can check out the final result here: https://regex101.com/r/UHMrqL/1 – D. Petrov Jan 05 '18 at 14:54
0
$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: expression3";
$pairs = explode(';', $test);
$headers = [];
$expressions = [];
foreach ($pairs as $p) {
    $he = explode(':', $p);
    $headers[] = trim($he[0]);
    $expressions[] = trim($he[1]);
}
foreach ($headers as $h) {
    if (!in_array($h, $allowed_headers)) {
        return false;
    }
}

foreach ($expressions as $e) {
    preg_match_all('/[a-z0-9]+/', $e, $matches);
    foreach ($matches as $m) {
        if (param_fails($m)) {
            echo "Expression $e contains forbidden param $m.";
        }
    }
}
Will Hines
  • 319
  • 2
  • 12
0

Regex appeared to be not as complicated as I thought when posting that question, so I've managed to achieve the pattern in its complete form by myself with the initial headstart owed to @mickmackusa. What I have finally come up with is that here, explained to you by regex101 itself: https://regex101.com/r/UHMrqL/1 The logic whic it's based on is described in the initial question. The only thing missing is the verification of the values of the headers and the names of the params, but that's easy to match afterwards with preg_match_all and verify with pure php checks. Thanks again for the attention and the help! :)

D. Petrov
  • 1,147
  • 15
  • 27