7

This is my regular expression code:

"onlyLetterSp": {
    "regex": /^[a-zA-Z\ \']+$/,
    "alertText": "* Letters only"
}

How can I change this to allow English characters as well as Japanese?

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
Ketan patil
  • 95
  • 1
  • 2
  • 7
  • 1
    You have my attention sir, I'll wait for the answer with you. No idea how to help you :( – martriay Mar 01 '13 at 05:37
  • My answer for JS (usable for PHP also), but currently, I am researching for a more complete answer... http://stackoverflow.com/questions/15033196/using-javascript-to-check-whether-a-string-contains-japanese-characters-includi/15034560#15034560 – nhahtdh Mar 01 '13 at 05:52
  • take a look [here](http://www.unicode.org/reports/tr18/) for any unicode word char minus digits it appears to be \p{L} but its late and the doc is huge so I was just skimming. Thought it was odd an uppercase is being used as those are usually negates. The doc shows examples of excluding sets out such as greek etc. I hope this is useful. – Victoria French Mar 01 '13 at 07:35
  • @VictoriaFrench: Set intersection and set subtraction are not implemented by PCRE, AFAIK. Only Java regex implements character set intersection and union. – nhahtdh Mar 01 '13 at 08:44
  • Yep your are right according to [this page](http://www.regular-expressions.info/javascript.html) Javascript offers **No Unicode support, except for matching single characters with \uFFFF**. But I did find [this page](http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/) that shows the unicode character sets for regex and I also stumbled across [this github project](https://gist.github.com/ryanmcgrath/982242) that may help with experimenting. Amazed this is so hard to solve. – Victoria French Mar 01 '13 at 17:40
  • 1
    Your tags are a bit confusing. I get the [regex]. The code looks like JavaScript, so I kind of get the [jquery] one. But what about [php]? – Chris Wesseling Mar 09 '13 at 12:04

2 Answers2

3

I found this link:

http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/

There are apparently a few different character sets for different types of Japanese.

Hiragana for example is:

[\x3041-\x3096]
Emery King
  • 3,550
  • 23
  • 34
  • i want to check English character and also Japanese. – Ketan patil Mar 01 '13 at 05:48
  • @MarshallHouse: That is only Hiragana. Japanese text consists of Katakana and Kanji (belongs to CJK ideograph block) also. – nhahtdh Mar 01 '13 at 05:56
  • 2
    perhaps /^[\x3041-\x3096\x30A0-\x30FF\x3400-\x4DB5\x4E00-\x9FCB\xF900-\xFA6A\x2E80-\x2FD5a-zA-Z]+/u$/ (not sure if the /u would go before the $/ or after. I have been reading that /u is needed though. – Victoria French Mar 01 '13 at 17:50
  • This is clearly the way to go. Put unicode intervals inside of the regexp class. – bgusach Mar 10 '13 at 10:03
1

You must be looking for the u regex modifier, which stands for Unicode. With it you can use POSIX symbols like \w to include whatever "word" characters you like

Shimon Rachlenko
  • 5,469
  • 40
  • 51
Vasyl Zhuk
  • 106
  • 1
  • 5