2

Using PHP / MySQL all encoded up as UTF, we have recently had to start capturing non-Latin characters, such as Chinese etc. We have PHP validation that checks the string length and alpha numeric such as:

if (!ereg("[[:alnum:]]{2,}",$_POST['company_name'])) {
    //error code here
}

This is not working on multi byte chars. I understand about the length being an issue (one char is not equal to one byte) but I was hoping if someone could provide a link / solution for matching a string for UTF8 language characters only NO special characters such as [*/ etc.

EDIT: I want to accept only a string that is xx long and only contains language characters alebit English / Chinese etc. and NOT any special characters *{/ etc. Hopefully that clarifies.

Smi
  • 13,850
  • 9
  • 56
  • 64
megaSteve4
  • 1,760
  • 1
  • 17
  • 24
  • What is the expected matching -- do you want to accept those non-Latin characters? Currently seems you're checking alphanumeric so those other chars won't be accepted. – Jerome Aug 26 '10 at 11:36

3 Answers3

1

Your requirements are a little vague, but you can enforce only letters (possibly combined with marks) and decimal numbers with

if (!preg_match('/^[\p{L}\p{M}\p{Nd}]{2,}$/u', $_POST['company_name'])) {
   //error here
}
Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • Im a regex nube so pls be patient! I can not get this to work I have tried with the three vars below but always throws error - pls help! Thanks $var = "若您是参展商"; // same result //$var = "test"; // same result //$var = "test{}/'*/-"; if (!preg_match('/^[\p{L}\p{M}\p{Nd}]{2,}$/u', $var)) { echo "Not just Unicode languaage chars - error"; } – megaSteve4 Aug 26 '10 at 14:00
  • @user It works. See here: http://codepad.viper-7.com/kGxOG2 Make sure your data is actually encoded in UTF-8 (for instance, the characters are not encoded with HTML entities). – Artefacto Aug 26 '10 at 15:33
  • Thanks for the patience - I do believe you that your code works but - I am running the exact same code on our server - using true utf8 chars and both are failing? ' $var fails $var2 fails ' - we have same php version as the codepad could there be any other server vars / settings that would make your regex fail? Once again thanks for your help. – megaSteve4 Aug 26 '10 at 15:55
  • @user Check if the output of [this](http://codepad.viper-7.com/VroOuL) is the same. – Artefacto Aug 26 '10 at 16:49
  • Have just run the exact same code on our server and still both vars fail. Without wanting to pester you too much any more suggestions? Thanks again – megaSteve4 Aug 27 '10 at 09:25
  • @user Is the output of the first var_dump the same (the long string)? – Artefacto Aug 27 '10 at 12:07
0

The mbstring extension of PHP has an mb_ereg() function, this would probably be a good starting point, I guess.

coudenysj
  • 376
  • 2
  • 8
0

You can try to match with \p{L}|\p{N} but you need to add the u option to your regex.

Sources :
www.regular-expressions.info

Colin Hebert
  • 91,525
  • 15
  • 160
  • 151