3

I am working on a project where i have to validate the BECS characters. Bulk Electronic Clearing System (BECS) only allowed the following characters.

BECS Character set

Type                        Description
Numeric                     0 to 9
Alphabetic                  Uppercase A to Z and Lowercase a to z
+                           Plus sign
-                           Minus sign or hyphen
@                           At sign
SP                          Blank space
$                           Dollar sign
!                           Exclamation mark
%                           Percentage sign
&                           Ampersand
(                           Left Parenthesis
)                           Right Parenthesis
*                           Asterik
.                           Period or decimal point
/                           Solidus (slash)
#                           Number Sign (Pound or Hash)
=                           Equal Sign
:                           Colon
;                           Semicolon
?                           Question mark
,                           Comma
’                           Apostrophe
[                           Left square bracket
]                           Right square bracket
_                           Low line (underscore)
^                           Circumflex

I have tried the following but not working:

preg_match("/^[A-Za-z0-9^_[]',?;:=#/.*()&%!$ @+-]+$/", $string);

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Anam
  • 11,999
  • 9
  • 49
  • 63

3 Answers3

6

Instead of worrying about escaping manually use preg_quote

The code would then be preg_match("/^[A-Za-z0-9".preg_quote("^_[]',?;:=#/.*()&%!$ @+-", "/")."]+$/", $string);

b0ne
  • 653
  • 3
  • 10
  • `preg_quote` is for escaping literal strings, not characters inside a character class. E.g. in this case, it returns `A\-Za\-z0\-9\^_\[\]',\?;\:\=#\/\.\*\(\)&%\!\$ @\+\-`, which is wrong – Mariano Aug 15 '16 at 08:11
  • I've given this an upvote because it's a good suggestion generally. However regex escaping within a character range (ie within square brackets) has different quoting rules, so preg_quote probably won't be quite right in this context. – Simba Aug 15 '16 at 08:12
  • 1
    Thanks for correcting it, upvoted. It will escape more characters than what's needed, but this is safe as long as you don't pass character ranges to `preg_quote`. – Mariano Aug 15 '16 at 08:21
  • If you want to make it even more cleaner then you could define your regex in two variables: one for range and one for other matches and then escape only the matches variable. This is not necessary if you're only using that regex one time – b0ne Aug 15 '16 at 08:23
4

Inside character classes, you don't need to escape most of the metacharacters.

/^[A-Za-z0-9^_[\]',?;:=#\/.*()&%!$ @+-]+$/
  • ] is escaped to prevent it from closing the character class
  • / needs to be escaped because we're using it as regex delimiter
  • - doesn't need to be escaped because it's the last character in the class
  • ^ doesn't need to be escaped because it's not the first character in the class


Or, if you want a shorter expression, the following regex covers the same range:

/^[ !#-;=?-[\]^_a-z]+$/
Mariano
  • 6,423
  • 4
  • 31
  • 47
1

you have to escape most of the chars in your string as they have special meaning in regular expressions

(of course you were right not to escape the leading ^ and the trailing $ as they indicate that no other character can be in the line):

preg_match("/^[A-Za-z0-9\\^_\\[\\]',\\?;:=#/\\.\\*\\(\\)\\&\\%!\\$ @\\+\\-]+$/", $string);

For the record a list of the BECS allowed chars vs regular expression chars:

Type                        Description
Numeric                     0 to 9
Alphabetic                  Uppercase A to Z and Lowercase a to z
+                           Plus sign: means 1 or more
-                           Minus sign or hyphen: used for char range
$                           Dollar sign: end of line
(                           Left Parenthesis: starts a group
)                           Right Parenthesis: ends a group
*                           Asterik: 0 or more
.                           Period or decimal point: any char (not in a range, would work)
?                           Question mark: 0 or 1
[                           Left square bracket: starts a range
]                           Right square bracket: ends a range
^                           Circumflex: not/line start
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 2
    Its not working. `preg_match(): Unknown modifier '\' on line number 5` – Anam Aug 15 '16 at 07:57
  • added more backslashes. Seeing other answers, they seem more appriopriate, though. – Jean-François Fabre Aug 15 '16 at 09:00
  • 1
    Still, you can't have an odd number of backslashes. Also, I'm not a big fan of the "If you don't know the rules, escape everything"... – Thomas Ayoub Aug 15 '16 at 09:01
  • I'm editing the answer. Still for the triple backslash before $ I followed this answer: http://stackoverflow.com/questions/9716443/how-to-escape-dollar-sign-in-a-string-using-perl-regex – Jean-François Fabre Aug 15 '16 at 09:06
  • Your link is for a perl program, not a PHP one. Running your code, I get a `PHP Warning: preg_match(): Unknown modifier '\' in prog.php on line 4` because the `/` is not escaped. Also, it would have been a lot easier to use `"/^[\w^[\\]',?;:=#\\/.*()&%!$ @+-]+$/"` – Thomas Ayoub Aug 15 '16 at 10:22