CodeIgniter Form Validation for Chinese Words

Question

I am using library of Form Validation in CodeIgniter. Below config try to include all Number, English words, Chinese words and space. But it's not work.

$config = array(
                array(
                       'field' => 'keywords',
                       'label' => 'keywords',
                       'rules' => 'regex_match[/[a-zA-Z0-9 \u4e00-\u9fa5]+$/]'
                    )
                );

However, if I deduce '\u4e00-\u9fa5', it's work.

$config = array(
                    array(
                           'field' => 'keywords',
                           'label' => 'keywords',
                           'rules' => 'regex_match[/[a-zA-Z0-9 ]+$/]'
                        )
                    );

Try `/^[a-zA-Z0-9\s\p{Han}]+$/`, as in PCRE, you can match Chinese chars with `\p{Han}` Unicode property. — Wiktor Stribiżew, Jul 03 '16 at 10:20
I test it. It can validate English words and space. But Chinese word is not work. — IvanK, Jul 03 '16 at 12:40

score 1 · Answer 1 · edited May 23 '17 at 10:29

1

There are three issues in the regex you have:

The validation regex should start matching at the start of the string, thus, you need the start of string anchor ^ or \A. Also, it is advisable to replace $ with the very end of the string anchor \z (as $ also matches before the final newline symbol in a string).
Revo is right, \uXXXX notation is not supported by PHP regex engine. However, you do not have to specify the range of Unicode code points here. Chinese characters in PHP PCRE regex can be defined with a Unicode property \p{Han}.
For a PCRE regex to become Unicode aware, you need to use the /u modifier.

So, use

/\A[a-zA-Z0-9\s\p{Han}]+\z/u

Or (a tiny bit less secure),

/^[a-zA-Z0-9\s\p{Han}]+$/u

edited May 23 '17 at 10:29

Community

1
1

answered Jul 03 '16 at 10:25

Wiktor Stribiżew

607,720
39
448
563

I test two methods. Both method can validate English words and space. But Chinese word is not work. – IvanK Jul 03 '16 at 12:38
Please provide a sample input string. it [matches Chinese words](https://regex101.com/r/bC2iK2/1). – Wiktor Stribiżew Jul 03 '16 at 12:39
I'm not sure about OP, but `\p{Han}` includes a lot more characters than the range provided. E.g. `\x{3400}-\x{4dbf}` (CJK Unified Ideographs Extension A) – revo Jul 03 '16 at 13:12
To: revo What is good for Traditional Chinese words To: Wiktor Stribiżew It is ok in the online test website, but not work in my code. – IvanK Jul 03 '16 at 13:36
I forgot about the `/u` modifier. Try `/^[a-zA-Z0-9\s\p{Han}]+$/u`. – Wiktor Stribiżew Jul 03 '16 at 14:11
If this won't work, you will need to implement a callback like [here](http://stackoverflow.com/questions/13982529/form-validation-rules-for-regex-match). – Wiktor Stribiżew Jul 03 '16 at 14:20
Thank you Wiktor Stribiżew. It's work now. And I search the difference between \p{Han} and \x{3400}-\x{4dbf} by Google. I am not very sure, but I think they are same. If anybody find the difference, please let me know. Thank you. – IvanK Jul 03 '16 at 15:58
It is possible Han includes more than one range. See [this PHP source code](https://github.com/php/php-src/blob/1c295d4a9ac78fcc2f77d6695987598bb7abcb83/ext/mbstring/php_unicode.h#L187). – Wiktor Stribiżew Jul 03 '16 at 18:32
@IvanK: If it works, please consider accepting the answer. – Wiktor Stribiżew Sep 13 '16 at 12:26

score 0 · Answer 2 · answered Jul 03 '16 at 09:17

0

PCRE does not support the \uFFFF syntax. Use \x{FFFF} instead.

/[a-zA-Z0-9 \x{4e00}-\x{9fa5}]+$/

answered Jul 03 '16 at 09:17

revo

47,783
14
74
117

I test it. it can't validate Chinese words, English words and space. – IvanK Jul 03 '16 at 12:39
@IvanK Giving Wiktor's sample, this RegEx [should work](https://regex101.com/r/bC2iK2/2). – revo Jul 03 '16 at 12:50
It is ok in the online test website, but not work in my code. – IvanK Jul 03 '16 at 13:36
@IvanK It's possible to use `u` modifier here as well: `/[a-zA-Z0-9 \x{4e00}-\x{9fa5}]+$/u` – revo Jul 05 '16 at 22:38

CodeIgniter Form Validation for Chinese Words

2 Answers2