1

I use this pretty simple regular expression:

^[\x20-\x7E]+$

When I try to use it with some of PHP regex functions, such as preg_match() it throws warning in sole case when I use ~ character (tilde) as delimiter.

So, execution of following lines goes well

preg_match("/^[\x20-\x7E]+$/", $s); # delimiter "/"
preg_match("!^[\x20-\x7E]+$!", $s); # delimiter "!"
preg_match("#^[\x20-\x7E]+$#", $s); # delimiter "#"

but for some reason, this line

preg_match("~^[\x20-\x7E]+$~", $s); # delimiter "~"

throws a warning

Warning: preg_match(): Unknown modifier ']' in some_script.php on line XX

note: it happens only when it's used with double-quotes!

I'm using tilde all the time as delimiter and never faced problems with it until this case and really wonder why that happens. Can't find does tilde have some special meaning in regular expressions (i'm 99% now sure it does not), or it's just a bug.

I can certainly work around this, but the question is: What's the difference between tilde and any other delimiter?

AD7six
  • 63,116
  • 12
  • 91
  • 123
Wh1T3h4Ck5
  • 8,399
  • 9
  • 59
  • 79
  • Just use something else then! – GordonM Jun 26 '16 at 07:29
  • @GordonM Question is not how to solve this, it's WHY IT HAPPENS? ;) – Wh1T3h4Ck5 Jun 26 '16 at 07:30
  • Hmm, does it think that `$~` is a variable? try single quotes... `'~^[\x20-\x7E]+$~'` – Bitwise Creative Jun 26 '16 at 07:33
  • @rock321987 maybe it's PHP 5.4.16 problem and this line actually stands in code as `if (preg_match("~^[\x20-\x7E]+$~", $s) === 1) {...}` where `$s` is any string. – Wh1T3h4Ck5 Jun 26 '16 at 07:35
  • in php 5.5 the same - https://eval.in/595641 – splash58 Jun 26 '16 at 07:37
  • @BitwiseCreative It works with single quotes, forgot to mention that, and good point, never thought about variable problem. Maybe it sees that as variable but I think that it should throw another warning like `Undefined variable: ~ in ...` like it does with "$x" when $x is not set. – Wh1T3h4Ck5 Jun 26 '16 at 07:39

1 Answers1

5

You were using double quotes:

 "~^[\x20-\x7E]+$~"

Which means that both \x20 and \x7E got interpreted in PHP string context, not by PCRE. Guess what \x7E amounts to.

So as @Bitwise mentioned, use single quotes. Or better yet escape the escape sequences:

 "~^[\\x20-\\x7E]+$~"

Thus the regex engine will still see [\x20-\x7E] instead of [ -~].

mario
  • 144,265
  • 20
  • 237
  • 291
  • nice, escaping-escapes really helps... but still holds this question, what's the difference between "~" and "#" or "!" as delimiters from PCRE's point of view... with other delimiters I don't have to escape escapes and expression is still between double quotes ;) – Wh1T3h4Ck5 Jun 26 '16 at 07:43
  • There's no difference. Your tilde regex was failing because PCRE saw `[ -~]` (an unescaped delimiter). – Bitwise Creative Jun 26 '16 at 07:49
  • PCRE doesn't really know about PHPs delimiters. See also: http://stackoverflow.com/a/31231183/345031 – mario Jun 26 '16 at 07:50
  • @mario Aw, I see now that 7E is actually a tilde character! Have not realized that at first. Problem solved! – Wh1T3h4Ck5 Jun 26 '16 at 07:56
  • `Or better yet` using double quotes without needing to, and then escaping unwanted interpolation is not "better". – AD7six Jun 26 '16 at 08:21
  • @AD7six Perhaps. Understanding string contexts is more worthwhile than a generalized workaround. As is often forgotten, the backslash also escapes itself in single quotes. – mario Jun 26 '16 at 08:28
  • Understanding is important, and useful; using a language feature you don't want (double quotes), and then dealing with the side effects is not imo better. The simple rule "always use single quoted strings [unless you have to use double quotes]" avoids these surprises/problems at no cost - especially the cost of readability. – AD7six Jun 26 '16 at 08:33
  • @AD7six I'll admit that's just me being pompous and "look at clever me". Albeit I'd really escape it `'~^[\\x20-\\x7E]+$~'` in single quotes still. (AKA: this post was brought to you by the OCD department). So I only opt for single quotes on lengthier regexps, and edge cases where escaping became too tedious/unreadable. – mario Jun 26 '16 at 12:23