Check whether a string contains Japanese/Chinese characters

Question

I need a way to check whether a string contains Japanese or Chinese text.

Currently I'm using this:

string.match(/[\u3400-\u9FBF]/);

but it does not work with this for example: ディアボリックラヴァーズ or バッテリー.

Could you help me with that?

Thanks

If Japanese can be matched with `[一-龯]` and Chinese with `[\u4E00-\u9FFF\u3400-\u4DFF]`, try using `if (/[一-龯\u4E00-\u9FFF\u3400-\u4DFF]/.test(s)) { alert("Contains Japanese or Chinese chars!"); }` — Wiktor Stribiżew, Apr 14 '17 at 20:36
@WiktorStribiżew No, that's incorrect. Japanese includes characters outside the CJK range. — , Apr 14 '17 at 20:38
Ok, replace the JA one with [`[\u3000-\u303F\u3040-\u309F\u30A0-\u30FF\uFF00-\uFFEF\u4E00-\u9FAF\u2605-\u2606\u2190-\u2195\u203B]`](https://regex101.com/r/a5z6kc/1). — Wiktor Stribiżew, Apr 14 '17 at 20:40
That's even weirder… some of the characters you're including, like U+2605 and U+2606, have nothing to do with Chinese or Japanese at all. (They're ★ and ☆.) — , Apr 14 '17 at 20:42
@duskwuff: See [this resource](https://gist.github.com/ryanmcgrath/982242): *Non-Japanese punctuation/formatting characters commonly used in Japanese text*. Yeah, [`/[\u3000-\u303F\u3040-\u309F\u30A0-\u30FF\uFF00-\uFFEF\u4E00-\u9FAF\u203B\u4E00-\u9FFF\u3400-\u4DFF]/`](https://regex101.com/r/a5z6kc/2) might be enough. — Wiktor Stribiżew, Apr 14 '17 at 20:45
Or a bit [more complex regex with all possible Chinese chars](https://regex101.com/r/a5z6kc/4). — Wiktor Stribiżew, Apr 14 '17 at 20:50

score 31 · Accepted Answer · answered Apr 14 '17 at 20:51

The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:

U+3040 - U+30FF: hiragana and katakana (Japanese only)
U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
U+FF66 - U+FF9F: half-width katakana (Japanese only)

As a regular expression, this would be expressed as:

/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.

Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.

To add Korean characters to the regex use the following: `\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f\u3131-\uD79D` — Paddy, May 04 '20 at 08:43

daviddna · Answer 2 · 2019-07-05T03:17:45.830

swift 4, changed the pattern to and NSRegularExpression for replace, maybe might help someone!

[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]

extension method

mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
        do {
            let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
            let range = NSMakeRange(0, self.count)
            self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
        } catch {
            return
        }
    }

    mutating func removeEastAsianChars() {
        let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
        removeRegexMatches(pattern: regexPatternEastAsianCharacters)
    }

example, string result is ABC

"ABC検診センター".removeEastAsianChars()

score 4 · Answer 3 · answered Feb 17 '20 at 04:56

You can use this code and it's works for me.

let str = "渣打銀行提供一系列迎合你生活需要嘅信用卡";
//let str = "SGGRAND DING HOUSE 4GRAND DING HOUSE";
const REGEX_CHINESE = /[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/;
const hasChinese = str.match(REGEX_CHINESE);
if(hasChinese){
  alert("Found");
}
else{
  alert("Not Found");
}

Check whether a string contains Japanese/Chinese characters

3 Answers3

Linked