0

I have issue in php preg_match.. I tried using preg_match to check this string..

$txt = 'dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.catdog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog.cat.dog';


echo count(explode('.', $txt)) . "<br>";
echo strlen($txt) , "<br>";
if(preg_match("/[a-z]+(?:([.][a-z]+)*)/i", $txt)) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}

But I think there is limit in preg_match.. the total count after I explode the dot is 1638, more than 1638, it returns not match. But when I tried using phpliveregex or regex 101 it can match the regex..

so here is the complete explanation.. I created a program for checking a format..

continue from this issue Javascript string match specific regex

User can input anything in textarea, as long as it's correct then it will retrive the data, but when user input wrong format, I should remove the wrong format by using preg_replace and retrive a correct format, when none of format is correct, then it will return error message..

this not 1 format only, there are about 10 formats I should check.. so this is not just a simple to explode the ., +, * or use cytpe_alpha().

The conclusion is the program or the product owner does not care what user input into the text area, they can input 10 formats into the textarea.. as long as I can check and passed the format, then it all be good..

example of wrong input that I should fix into correct format..

150-50-30----20=50+dog......cat.......cow.....chicken,,,,.,.,pencil,
dog,,,.,cat.,.,.chicken......50-20-10-5=15+1*2*3*4=24+50-50*30*20=0+*4*8=32

so after I correct the format, it will be like this..

150-50-30-20=50+dog.cat.cow.chicken.pencil.dog.cat.chicken.50-20-10-5=15+1*2*3*4=24+50-50*30*20=0+4*8=32

can anyone help with this issue?

Willyanto Halim
  • 413
  • 1
  • 6
  • 19
  • Can you let me know what is the purpose of this regex? Like, for what kind of matches it is made for? – nice_dev Aug 02 '21 at 05:24
  • @nice_dev the propose of this , I want to check if the input is valid and match, then I want to retrieve the input.. but currently I cannot retrieve the input because its not match.. – Willyanto Halim Aug 02 '21 at 06:01
  • What is that input and what is the expected match result? – nice_dev Aug 02 '21 at 06:01
  • @nice_dev I would like to pass `$txt` to array and store it to database.. that's just an example case.. not a real case.. the real case is, when user create a text like that, I should check if the format is correct or not? if the format is correct like the regex, then I could breakdown the format and pass it to database.. – Willyanto Halim Aug 02 '21 at 06:10
  • Smaller reproducible snippet: https://3v4l.org/qV6EH#v7.2.0 Also, you don't need all of the parentheticals. `[a-z]+(\.[a-z]+)*` – mickmackusa Aug 02 '21 at 06:15
  • I notice PHP versions < 7.2 will not match. Curiously, 5.6 and 4.4 do match. Guessing there's a limit exceeded somewhere that should return an error, but instead silently fails. FYI, PHP 7.2, released in 2017, is EOL since 1 Dec 2020. You really should upgrade. – Markus AO Aug 02 '21 at 06:18
  • @WillyantoHalim Perhaps, there are better ways to check for a match? Is the match about dog should be followed by a cat and then by a dog and so on? – nice_dev Aug 02 '21 at 06:49
  • nope.. user can input anything and I have to filter the right input.. currently in `$txt` is the right format.. user can input other than the format.. so I need the correct data, then when user input a wrong format, it will return the format is error.. @nice_dev – Willyanto Halim Aug 02 '21 at 06:57
  • @WillyantoHalim Exactly. Let user input anything. I am asking what are the different correct _formats_ of the user input possible? Please don't make it hard for us anymore and add those possible _formats_ by editing your question. – nice_dev Aug 02 '21 at 07:09
  • I already update the question @nice_dev – Willyanto Halim Aug 02 '21 at 07:31
  • 1
    We cannot guess what your 10 different formats are. Please enlighten us. From what I see, there are no alphabeticals in your linked earlier question. – mickmackusa Aug 02 '21 at 07:33
  • yes.. this case is alphabetcials.. and another earlier question is digit.. so currently only 2 format I could show you.. and user can combine that format into text area.. – Willyanto Halim Aug 02 '21 at 07:35
  • question is already updated to newest @mickmackusa – Willyanto Halim Aug 02 '21 at 07:44
  • So, the goal posts have completely shifted from validation to sanitization? – mickmackusa Aug 02 '21 at 08:07
  • I already sanitized the input, then when the input exceed the limit, it returned not match.. – Willyanto Halim Aug 02 '21 at 08:09
  • Perhaps to solve this XY Problem, we should better understand why need to receive these excessively long (seemingly limitless) formatted strings. Can you not impose a character limit? Can you not have your end users deliver their data in a different fashion? After the string has been sanitized and validated, what happens next? After knowing that, perhaps we can devise better / more direct techniques to achieve what you truly need. – mickmackusa Aug 03 '21 at 23:17

1 Answers1

2

It appears that you have reached the memory limit on "capturing". If you simply don't capture any substrings, you'll be just fine.

Code: (Demo)

$txt = 'dog.cat' . str_repeat('.dog.cat', 815);

for ($i = 0; $i < 5; ++$i) {
    $txt .= '.dog.cat';
    echo count(explode('.', $txt)) . "\n";
    echo strlen($txt) . "\n";
    if (preg_match("/[a-z]+(?:\.[a-z]+)*/i", $txt, $match)) {
        echo "A match was found." . strlen($match[0]);
    } else {
        echo "A match was not found.";
    }
    echo "\n---\n";
}

Output:

1634
6535
A match was found.6535
---
1636
6543
A match was found.6543
---
1638
6551
A match was found.6551
---
1640
6559
A match was found.6559
---
1642
6567
A match was found.6567
---

About the error: https://3v4l.org/RjfiN#v7.2.0

You are getting error constant 6.

PREG_JIT_STACKLIMIT_ERROR

preg_last_error_msg() isn't available until PHP8.

I see some good answers at: PHP PREG_JIT_STACKLIMIT_ERROR - inefficient regex


If preg_match() is proving to be unsuitable/miserable for your requirements, then stop using it. There are other available tools to determine if a string is strictly comprised of dot-separated alphabetical substrings.

function isDotSeparatedAlphabetical($string) {
    foreach (explode('.', $string) as $word) {
        if (!ctype_alpha($word)) {
            return false;
        }
    }
    return true;
}

$txt = 'dog.cat' . str_repeat('.dog.cat', 815);

for ($i = 0; $i < 5; ++$i) {
    $txt .= '.dog.cat';
    var_export(['length' => strlen($txt), 'verdict' => isDotSeparatedAlphabetical($txt)]);
    echo "\n---\n";
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • Hello thanks for the answer, I really appreciate it, but I have increased it to 2732, it will become not match anymore.. because the max is 2731.. how to increase the limit through php.ini? – Willyanto Halim Aug 02 '21 at 06:48
  • 2
    No matter what you set it to, you are going to find a way to hit the next limit. Why not rethink your application and devise a non-preg solution? How about exploding on dots and checking if any element in the array fails `ctype_alpha()`? I agree with nice_dev, if you have more complex data requirements than just dot-delimited alphabetical strings, then we will need to see a more realistic sample string. – mickmackusa Aug 02 '21 at 06:49
  • I have set the limit through condition with max 3700 data, more than that, it will show error message.. currently I need 3700 data.. it should be using regex., because user can input anything in text area.. I have to filter what the user fills in and I must get the right format and retreive it to database.. – Willyanto Halim Aug 02 '21 at 06:52
  • so currently in php.ini, I set pcre.jit to 0.. is it save? its now working when I set it to 0.. – Willyanto Halim Aug 02 '21 at 07:07
  • Think about what you've just done. You are already not trusting the end user to "do the right thing", now you've reconfigured your system to allow naughty end users to abuse your server by passing in HUGE strings and forcing it to consume resources. I do not recommend your approach and strongly urge you to consider my very simple and efficient `ctype_alpha()` approach. – mickmackusa Aug 02 '21 at 07:09
  • hello, I have updated the question.. as I said before, this is not just a simple case, I could return error when the format is incorrect.. I should check the correct format and retrieve only the correct format.. so the example above is the correct format which I already retrieve.. and there are about 10 formats I should check, not just that kind of format, so it is really not only 1 condition in the corrected format.. – Willyanto Halim Aug 02 '21 at 07:33
  • At this point, it may be better to abandon this page because the requirements are blurry. Perhaps ask a new question where you provide all 10 different formats and ask about how to sanitize the values. – mickmackusa Aug 04 '21 at 23:03