7

It appears that PHP's preg_match has a 3276 character limit for matching repeating characters in some cases.

i.e.

^(.|\s){0,3276}$ works, but ^(.|\s){0,3277}$ does not.

It doesn't seem to always apply, as /^(.){0,3277}$/ works.

I can't find this mentioned anywhere in PHP's documentation or the bug tracker. The number 3276 seems a bit of an odd boundary, the only thing I can think of is that it's approximately 1/10th of 32767, which is the limit for a signed 16-bit integer.

preg_last_error() returns 0.

I've reproduced the issue on http://www.phpliveregex.com/ as well as my local system and the webserver.

EDIT: Looks like we're getting "Warning: preg_match(): Compilation failed: regular expression is too large at offset 16" out of the code, so it appears to be the same issue as PHP preg_match_all limit.

However, the regex itself isn't very large... Does PHP do some kind of expansion when you have repeating groups that's making it too large?

Community
  • 1
  • 1
Stu
  • 198
  • 1
  • 7
  • it's more then likely is a memory boundary, preg_match will return an array with that many elements so you need to have enough memory to allow for a large array, try bumping up your max memory limit and see if it changes. – Patrick Evans Jul 29 '13 at 14:10
  • 1
    Did you check `preg_last_error()`? – Jason McCreary Jul 29 '13 at 14:11
  • is your error reporting on? there should be some kind of error if you use too much memory – x4rf41 Jul 29 '13 at 14:11
  • possible duplicate of [PHP preg\_match\_all limit](http://stackoverflow.com/questions/8268624/php-preg-match-all-limit) (I suspect that this is the error you're getting, and is simply being suppressed). – cmbuckley Jul 29 '13 at 14:11
  • preg_last_error() returns 0, I'll add that to the post. – Stu Jul 29 '13 at 14:26
  • I guess the code in the question are just examples you've come up with to demo the problem rather than your real-world code, right? (because I can't really see why you wouldn't be starting with `strlen()` rather than regex for this example) – Spudley Jul 29 '13 at 14:29
  • The regex examples are based on those provided by a client. Basically, we have a system where the user can enter their own validation regexes for form fields, they can't enter PHP code. – Stu Jul 29 '13 at 14:31
  • Why would anyone need a pattern like `(.|\s)`. That's "any character" or "a while space character". The latter is included in the former anyway, so the pattern is redundant. It'll still cause the parser to do loads of back-tracking though. – Spudley Jul 30 '13 at 19:15
  • Oh, and just a thought re "the client can enter their own regex". Be careful not to allow them to enter the `e` modifier on the end, as that could result in them running arbitrary PHP code. – Spudley Jul 30 '13 at 19:16
  • What happens if you leave out the upper limit, like so: ^(.|\s){0,}$ – centuren Jul 30 '13 at 19:10

2 Answers2

1

In order to handle Perl-compatible regular expressions, PHP just bundles a third-party library that takes care of the job. The behaviour you describe is actually documented:

The "*" quantifier is equivalent to {0,} , the "+" quantifier to {1,} , and the "?" quantifier to {0,1} . n and m are limited to non-negative integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms.

So there's always a hard limit. Why do your tests suggest that PHP limit is 10 times smaller than the typical one? No idea about that :)

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • 1
    +1 because this is probably the issue. The smaller limit seen in the question only applies when combining `.` and `\s` in a way that is virtually guaranteed to produce large amounts of backtracking. If we guess that the limit is affected by backtracking (which is a reasonable guess) then it would not be a surprise if the limit came down to this sort of level. – Spudley Aug 01 '13 at 11:29
0

Try using ^(.|\s){0,3276}(.|\s){0,1}$