10

I want to know that how can I check if a string contains Chinese in Swift?

For example, I want to check if there's Chinese inside:

var myString = "Hi! 大家好!It's contains Chinese!"

Thanks!

He Yifei 何一非
  • 2,592
  • 4
  • 38
  • 69

5 Answers5

15

This answer to How to determine if a character is a Chinese character can also easily be translated from Ruby to Swift (now updated for Swift 3):

extension String {
    var containsChineseCharacters: Bool {
        return self.range(of: "\\p{Han}", options: .regularExpression) != nil
    }
}

if myString.containsChineseCharacters {
    print("Contains Chinese")
}

In a regular expression, "\p{Han}" matches all characters with the "Han" Unicode property, which – as I understand it – are the characters from the CJK languages.

Community
  • 1
  • 1
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • and is there anyway I can get just the Chinese words? Thanks! – He Yifei 何一非 Jul 06 '15 at 12:05
  • @Arefly: Unfortunately, I am not an expert for the Chinese language, I have "blindly" translated the Ruby code :) There are also "Katakana" and "Hiragana" properties, but I don't know if they are of any use. – Martin R Jul 06 '15 at 12:12
  • @Arefly: For more fine-grained control, Airspeed Velocity's answer might be better suited because you can adjust the table with the Unicode ranges according to your needs, such as "only Chinese characters". – Martin R Jul 06 '15 at 12:39
5

Looking at questions on how to do this in other languages (such as this accepted answer for Ruby) it looks like the common technique is to determine if each character in the string falls in the CJK range. The ruby answer could be adapted to Swift strings as extension with the following code:

extension String {
    var containsChineseCharacters: Bool {
        return self.unicodeScalars.contains { scalar in
            let cjkRanges: [ClosedInterval<UInt32>] = [
                0x4E00...0x9FFF,   // main block
                0x3400...0x4DBF,   // extended block A
                0x20000...0x2A6DF, // extended block B
                0x2A700...0x2B73F, // extended block C
            ]
            return cjkRanges.contains { $0.contains(scalar.value) }
        }
    }
}

// true:
"Hi! 大家好!It's contains Chinese!".containsChineseCharacters
// false:
"Hello, world!".containsChineseCharacters

The ranges may already exist in Foundation somewhere rather than manually hardcoding them.

The above is for Swift 2.0, for earlier, you will have to use the free contains function rather than the protocol extension (twice):

extension String {
    var containsChineseCharacters: Bool {
        return contains(self.unicodeScalars) {
          // older version of compiler seems to need extra help with type inference 
          (scalar: UnicodeScalar)->Bool in
            let cjkRanges: [ClosedInterval<UInt32>] = [
                0x4E00...0x9FFF,   // main block
                0x3400...0x4DBF,   // extended block A
                0x20000...0x2A6DF, // extended block B
                0x2A700...0x2B73F, // extended block C
            ]
            return contains(cjkRanges) { $0.contains(scalar.value) }
        }
    }
}
Community
  • 1
  • 1
Airspeed Velocity
  • 40,491
  • 8
  • 113
  • 118
3

The accepted answer only find if string contains Chinese character, i created one suit for my own case:

enum ChineseRange {
    case notFound, contain, all
}

extension String {
    var findChineseCharacters: ChineseRange {
        guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else {
            return .notFound
        }
        var result: ChineseRange
        switch a {
        case nil:
            result = .notFound
        case self.startIndex..<self.endIndex:
            result = .all
        default:
            result = .contain
        }
        return result
    }
}

if "你好".findChineseCharacters == .all {
    print("All Chinese")
}

if "Chinese".findChineseCharacters == .notFound {
    print("Not found Chinese")
}

if "Chinese你好".findChineseCharacters == .contain {
    print("Contains Chinese")
}

gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a

William Hu
  • 15,423
  • 11
  • 100
  • 121
2

Try this in Swift 2:

var myString = "Hi! 大家好!It's contains Chinese!"

var a = false

for c in myString.characters {
    let cs = String(c)
    a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false))
}
print("\(myString) contains Chinese characters = \(a)")
Grimxn
  • 22,115
  • 10
  • 72
  • 85
0

I have created a Swift 3 String extension for checking how much Chinese characters a String contains. Similar to the code by Airspeed Velocity but more comprehensive. Checking various Unicode ranges to see whether a character is Chinese. See Chinese character ranges listed in the tables under section 18.1 in the Unicode standard specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf

The String extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift

Usage example:

let myString = "Hi! 大家好!It contains Chinese!"
let chinesePercentage = myString.chinesePercentage()
let chineseCharacterCount = myString.chineseCharactersCount()
print("String contains \(chinesePercentage) percent Chinese. That's \(chineseCharacterCount) characters.")
Niklas Berglund
  • 3,563
  • 4
  • 32
  • 32