2

I am using library of Form Validation in CodeIgniter. Below config try to include all Number, English words, Chinese words and space. But it's not work.

$config = array(
                array(
                       'field' => 'keywords',
                       'label' => 'keywords',
                       'rules' => 'regex_match[/[a-zA-Z0-9 \u4e00-\u9fa5]+$/]'
                    )
                );

However, if I deduce '\u4e00-\u9fa5', it's work.

$config = array(
                    array(
                           'field' => 'keywords',
                           'label' => 'keywords',
                           'rules' => 'regex_match[/[a-zA-Z0-9 ]+$/]'
                        )
                    );
IvanK
  • 65
  • 1
  • 6

2 Answers2

1

There are three issues in the regex you have:

  • The validation regex should start matching at the start of the string, thus, you need the start of string anchor ^ or \A. Also, it is advisable to replace $ with the very end of the string anchor \z (as $ also matches before the final newline symbol in a string).
  • Revo is right, \uXXXX notation is not supported by PHP regex engine. However, you do not have to specify the range of Unicode code points here. Chinese characters in PHP PCRE regex can be defined with a Unicode property \p{Han}.
  • For a PCRE regex to become Unicode aware, you need to use the /u modifier.

So, use

/\A[a-zA-Z0-9\s\p{Han}]+\z/u

Or (a tiny bit less secure),

/^[a-zA-Z0-9\s\p{Han}]+$/u
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I test two methods. Both method can validate English words and space. But Chinese word is not work. – IvanK Jul 03 '16 at 12:38
  • Please provide a sample input string. it [matches Chinese words](https://regex101.com/r/bC2iK2/1). – Wiktor Stribiżew Jul 03 '16 at 12:39
  • I'm not sure about OP, but `\p{Han}` includes a lot more characters than the range provided. E.g. `\x{3400}-\x{4dbf}` (CJK Unified Ideographs Extension A) – revo Jul 03 '16 at 13:12
  • To: revo What is good for Traditional Chinese words To: Wiktor Stribiżew It is ok in the online test website, but not work in my code. – IvanK Jul 03 '16 at 13:36
  • I forgot about the `/u` modifier. Try `/^[a-zA-Z0-9\s\p{Han}]+$/u`. – Wiktor Stribiżew Jul 03 '16 at 14:11
  • If this won't work, you will need to implement a callback like [here](http://stackoverflow.com/questions/13982529/form-validation-rules-for-regex-match). – Wiktor Stribiżew Jul 03 '16 at 14:20
  • Thank you Wiktor Stribiżew. It's work now. And I search the difference between \p{Han} and \x{3400}-\x{4dbf} by Google. I am not very sure, but I think they are same. If anybody find the difference, please let me know. Thank you. – IvanK Jul 03 '16 at 15:58
  • It is possible Han includes more than one range. See [this PHP source code](https://github.com/php/php-src/blob/1c295d4a9ac78fcc2f77d6695987598bb7abcb83/ext/mbstring/php_unicode.h#L187). – Wiktor Stribiżew Jul 03 '16 at 18:32
  • @IvanK: If it works, please consider accepting the answer. – Wiktor Stribiżew Sep 13 '16 at 12:26
0

PCRE does not support the \uFFFF syntax. Use \x{FFFF} instead.

/[a-zA-Z0-9 \x{4e00}-\x{9fa5}]+$/
revo
  • 47,783
  • 14
  • 74
  • 117