0

Below is the regex pattern literal in javascript for accepting the firstname, middlename and lastname from html input element. There is zero or one space in between.

var regpat= /^[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}\s?[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}\s?[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}$/;/

Inputs:

first name is 優子, last name is 白石, in japanese

first name is , middle name is , last name is , in chinese

1) Is this regex pattern complete to accept names having English characters?

2) How can this regex literal be enhanced to accept Chinese/Korean/Japanese characters sitting in Unicode BMP(plane 0) characters?

Note: 17 planes in Unicode

overexchange
  • 15,768
  • 30
  • 152
  • 347
  • You do know that even BMP includes many [control characters](https://en.wikipedia.org/wiki/Unicode_control_characters) and [combining characters](https://en.wikipedia.org/wiki/Combining_character) that need to be used with other characters, and that BMP may expands with Unicode update, right? – Sheepy Dec 21 '15 at 07:32
  • @Sheepy Relevant point is to know the CJK ranges. `var regpat= /^[\u0041-\u005A\u0061-\u007A\u4E00-\u9FFF\.\' \-]{2,15}\s?/;console.log(regpat. test('優子'));` – overexchange Dec 21 '15 at 07:45
  • 1
    You are probably going to be interested in reading this: http://www.w3.org/International/questions/qa-personal-names However to note the entire BMP plane is `\u0000-\uFFFF` (this does include the NULL character though). Would people named "Adèle" and "Élise" be permitted to use your form? – Dean Taylor Dec 21 '15 at 07:53
  • @overexchange Ok, the improved question is better. But BMP Chinese is pretty incomplete - [陶](https://en.wikipedia.org/wiki/Chip_Tsao), a well known columnist and broadcaster in Hong Kong - has a name in the supplement plane. Why do you want to limit to BMP? – Sheepy Dec 21 '15 at 08:16
  • Possible duplicate: [What's the complete range for Chinese characters in Unicode?](http://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode). Alternatively, wikipedia has a pretty up to date [CJK list](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs). – Sheepy Dec 21 '15 at 08:24
  • @Sheepy I need to use surrogate pairs for using non-BMP plane characters, for which I am looking for unicode code point escape syntax. – overexchange Dec 21 '15 at 08:45
  • @overexchange So BMP is a technical issue which is another question. There exists ES6 regexp [transpiler](https://github.com/mathiasbynens/regexpu) - a simple [Google search](https://www.google.com/search?safe=off&q=ES6+unicode+regex+transpiler). While we are at it, what are you trying to do? – Sheepy Dec 21 '15 at 08:48

0 Answers0