Javascript regex alphanumeric english and japanese

Question

I am trying to make a regex that only allows chars A-Z + ints 0 - 9 together with dash - and underscore _ but also Japanese chars.

$.validator.addMethod("alphaDash", function(value, element) {
        return this.optional(element) || /^[a-zA-Z0-9-_]+$/i.test(value);
      }, "Username must contain only letters, numbers, dashes or underscores.");

The regex above /^[a-zA-Z0-9-_]+$/ only works for english chars, how can I make it accept japanese chars? Hiragana/Katakana/Kanji

See [Check whether a string contains Japanese/Chinese characters](http://stackoverflow.com/questions/43418812/check-whether-a-string-contains-japanese-chinese-characters). — Wiktor Stribiżew, Apr 27 '17 at 11:56
FWIW, the `XRegExp` lib is pretty darned cool: http://xregexp.com/plugins/#unicode — T.J. Crowder, Apr 27 '17 at 11:56
Does [`^[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9fa-zA-Z0-9-_]+$`](https://regex101.com/r/pLoL5S/1) work for you? — Wiktor Stribiżew, Apr 27 '17 at 12:00
Watch out since these are script ranges, they do not match just letters/digits. Perhaps, you really need to use XRegExp and its `\pL` and `\pN` constructs to match any Unicode letter and digit. — Wiktor Stribiżew, Apr 27 '17 at 12:07
@WiktorStribiżew I tried the lib with this: ``/[a-zA-Z0-9-_\p{Hiragana}\p{Katakana}]+$/`` but it fails If my string ends with a Hiragana or Katakana char which I dont want — Kiow, Apr 27 '17 at 12:31
@WiktorStribiżew **werえ** will fail, **werえ3** will pass — Kiow, Apr 27 '17 at 12:34
[I got *true* in both cases](https://jsfiddle.net/x455p6hq/). — Wiktor Stribiżew, Apr 27 '17 at 12:36
@WiktorStribiżew my code ``$.validator.addMethod("alphaDash", function(value, element) { return this.optional(element) || /[a-zA-Z0-9-_\p{Hiragana}\p{Katakana}]+$/i.test(value); }, "Username must contain only letters, numbers, dashes or underscores.");`` — Kiow, Apr 27 '17 at 12:37
Sorry, you are doing it all wrong. You cannot use Unicode properties like `\p{Han}` (this matches all Chinese chars) with JS native `RegExp`. You must reference the `XRegExp` library. — Wiktor Stribiżew, Apr 27 '17 at 12:39
@WiktorStribiżew got it to work: ``$.validator.addMethod("alphaDash", function(value, element) { var re = XRegExp('^[a-zA-Z0-9-_\\p{Hiragana}\\p{Katakana}]+$'); return this.optional(element) || re.test(value); }, "Username must contain only letters, numbers, dashes or underscores.");`` — Kiow, Apr 27 '17 at 12:52
Yes, but `[a-zA-Z0-9_]` = `\w`. Also, don't you need to match Kanji as well? You only included Hiragana & Katakana. — Wiktor Stribiżew, Apr 27 '17 at 12:53

score 3 · Accepted Answer · answered Apr 27 '17 at 13:10

Acc. to XRegExp Unicode scripts:

Hiragana (\p{Hiragana}) char regex: [\u3041-\u3096\u309D-\u309F]|\uD82C\uDC01|\uD83C\uDE00
Katakana (\p{Katakana}) char regex: [\u30A1-\u30FA\u30FD-\u30FF\u31F0-\u31FF\u32D0-\u32FE\u3300-\u3357\uFF66-\uFF6F\uFF71-\uFF9D]|\uD82C\uDC00
Kanji (\p{Han}): [\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FD5\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1]|\uD87E[\uDC00-\uDE1D]

You may either use XRegExp (which is preferable since the library is constantly updated):

var rx = new XRegExp("^[-\\w\\p{Hiragana}\\p{Katakana}\\p{Han}]+$");
console.log(XRegExp.test("werえ", rx));
console.log(XRegExp.test("werえ3", rx));

<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>

Or you may use those ranges to build a regex that you will have to support later:

var pHiragana = "[\\u3041-\\u3096\\u309D-\\u309F]|\\uD82C\\uDC01|\\uD83C\\uDE00";
var pKatakana = "[\\u30A1-\\u30FA\\u30FD-\\u30FF\\u31F0-\\u31FF\\u32D0-\\u32FE\\u3300-\\u3357\\uFF66-\\uFF6F\\uFF71-\\uFF9D]|\\uD82C\\uDC00";
var pHan = "[\\u2E80-\\u2E99\\u2E9B-\\u2EF3\\u2F00-\\u2FD5\\u3005\\u3007\\u3021-\\u3029\\u3038-\\u303B\\u3400-\\u4DB5\\u4E00-\\u9FD5\\uF900-\\uFA6D\\uFA70-\\uFAD9]|[\\uD840-\\uD868\\uD86A-\\uD86C\\uD86F-\\uD872][\\uDC00-\\uDFFF]|\\uD869[\\uDC00-\\uDED6\\uDF00-\\uDFFF]|\\uD86D[\\uDC00-\\uDF34\\uDF40-\\uDFFF]|\\uD86E[\\uDC00-\\uDC1D\\uDC20-\\uDFFF]|\\uD873[\\uDC00-\\uDEA1]|\\uD87E[\\uDC00-\\uDE1D]";
var rx = new RegExp("^([\\w-]|" + pHiragana + "|" + pKatakana + "|" + pHan + ")+$");
console.log(rx.test("werえ"));
console.log(rx.test("werえ3"));

score 0 · Answer 2 · edited May 23 '17 at 10:31

0

Here's an example regex which would match Hiragana (unicode 3040-309F): /[a-zA-Z0-9_\u3040-\u309F]+/ http://regexr.com/3frf9

You can alter this to add other dialects/languages. You may want to check out this answer to see some of the other unicode values, or just look them up online elsewhere.

edited May 23 '17 at 10:31

Community

1
1

answered Apr 27 '17 at 12:04

jas7457

1,712
13
21

Javascript regex alphanumeric english and japanese

2 Answers2