Change RegEx to allow for both English & Japanese characters

Question

This is my regular expression code:

"onlyLetterSp": {
    "regex": /^[a-zA-Z\ \']+$/,
    "alertText": "* Letters only"
}

How can I change this to allow English characters as well as Japanese?

You have my attention sir, I'll wait for the answer with you. No idea how to help you :( — martriay, Mar 01 '13 at 05:37
My answer for JS (usable for PHP also), but currently, I am researching for a more complete answer... http://stackoverflow.com/questions/15033196/using-javascript-to-check-whether-a-string-contains-japanese-characters-includi/15034560#15034560 — nhahtdh, Mar 01 '13 at 05:52
take a look [here](http://www.unicode.org/reports/tr18/) for any unicode word char minus digits it appears to be \p{L} but its late and the doc is huge so I was just skimming. Thought it was odd an uppercase is being used as those are usually negates. The doc shows examples of excluding sets out such as greek etc. I hope this is useful. — Victoria French, Mar 01 '13 at 07:35
@VictoriaFrench: Set intersection and set subtraction are not implemented by PCRE, AFAIK. Only Java regex implements character set intersection and union. — nhahtdh, Mar 01 '13 at 08:44
Yep your are right according to [this page](http://www.regular-expressions.info/javascript.html) Javascript offers **No Unicode support, except for matching single characters with \uFFFF**. But I did find [this page](http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/) that shows the unicode character sets for regex and I also stumbled across [this github project](https://gist.github.com/ryanmcgrath/982242) that may help with experimenting. Amazed this is so hard to solve. — Victoria French, Mar 01 '13 at 17:40
Your tags are a bit confusing. I get the [regex]. The code looks like JavaScript, so I kind of get the [jquery] one. But what about [php]? — Chris Wesseling, Mar 09 '13 at 12:04

score 3 · Answer 1 · answered Mar 01 '13 at 05:45

3

I found this link:

http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/

There are apparently a few different character sets for different types of Japanese.

Hiragana for example is:

[\x3041-\x3096]

answered Mar 01 '13 at 05:45

Emery King

3,550
23
34

i want to check English character and also Japanese. – Ketan patil Mar 01 '13 at 05:48
@MarshallHouse: That is only Hiragana. Japanese text consists of Katakana and Kanji (belongs to CJK ideograph block) also. – nhahtdh Mar 01 '13 at 05:56
2

perhaps /^[\x3041-\x3096\x30A0-\x30FF\x3400-\x4DB5\x4E00-\x9FCB\xF900-\xFA6A\x2E80-\x2FD5a-zA-Z]+/u$/ (not sure if the /u would go before the $/ or after. I have been reading that /u is needed though. – Victoria French Mar 01 '13 at 17:50
This is clearly the way to go. Put unicode intervals inside of the regexp class. – bgusach Mar 10 '13 at 10:03

score 1 · Answer 2 · edited Mar 10 '13 at 10:17

1

You must be looking for the u regex modifier, which stands for Unicode. With it you can use POSIX symbols like \w to include whatever "word" characters you like

edited Mar 10 '13 at 10:17

Shimon Rachlenko

5,469
40
51

answered Mar 10 '13 at 09:56

Vasyl Zhuk

106
1
5

Change RegEx to allow for both English & Japanese characters

2 Answers2

Linked