Regex literal for accepting names having non-english characters

Question

Below is the regex pattern literal in javascript for accepting the firstname, middlename and lastname from html input element. There is zero or one space in between.

var regpat= /^[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}\s?[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}\s?[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}$/;/

Inputs:

first name is 優子, last name is 白石, in japanese

first name is 瞿, middle name is 秋, last name is 白, in chinese

1) Is this regex pattern complete to accept names having English characters?

2) How can this regex literal be enhanced to accept Chinese/Korean/Japanese characters sitting in Unicode BMP(plane 0) characters?

Note: 17 planes in Unicode

You do know that even BMP includes many [control characters](https://en.wikipedia.org/wiki/Unicode_control_characters) and [combining characters](https://en.wikipedia.org/wiki/Combining_character) that need to be used with other characters, and that BMP may expands with Unicode update, right? — Sheepy, Dec 21 '15 at 07:32
@Sheepy Relevant point is to know the CJK ranges. `var regpat= /^[\u0041-\u005A\u0061-\u007A\u4E00-\u9FFF\.\' \-]{2,15}\s?/;console.log(regpat. test('優子'));` — overexchange, Dec 21 '15 at 07:45
You are probably going to be interested in reading this: http://www.w3.org/International/questions/qa-personal-names However to note the entire BMP plane is `\u0000-\uFFFF` (this does include the NULL character though). Would people named "Adèle" and "Élise" be permitted to use your form? — Dean Taylor, Dec 21 '15 at 07:53
@overexchange Ok, the improved question is better. But BMP Chinese is pretty incomplete - [陶](https://en.wikipedia.org/wiki/Chip_Tsao), a well known columnist and broadcaster in Hong Kong - has a name in the supplement plane. Why do you want to limit to BMP? — Sheepy, Dec 21 '15 at 08:16
Possible duplicate: [What's the complete range for Chinese characters in Unicode?](http://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode). Alternatively, wikipedia has a pretty up to date [CJK list](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs). — Sheepy, Dec 21 '15 at 08:24
@Sheepy I need to use surrogate pairs for using non-BMP plane characters, for which I am looking for unicode code point escape syntax. — overexchange, Dec 21 '15 at 08:45
@overexchange So BMP is a technical issue which is another question. There exists ES6 regexp [transpiler](https://github.com/mathiasbynens/regexpu) - a simple [Google search](https://www.google.com/search?safe=off&q=ES6+unicode+regex+transpiler). While we are at it, what are you trying to do? — Sheepy, Dec 21 '15 at 08:48

Regex literal for accepting names having non-english characters

0 Answers0