Check if string latin or cyrillic

Question

Is it some way to check if some string latin or cyrillic? I've tried localizedCompare String method, but it don't gave me needed result.

Do you mean if a string contains *only latin* or *only cyrillic* characters? Because a string can contain both (plus some others like greek, chinese, arabic, hebrew ...) — Martin R, Aug 02 '16 at 14:10
can help you [here](http://nshipster.com/cfstringtransform/) — Özgür Ersil, Aug 02 '16 at 14:26
AMomchilov, Then i need to get 'false'. Or if it is some method from the box/third party which checks characters - it is good too. — Ookey, Aug 02 '16 at 14:29

Alonso Urbano · Accepted Answer · 2016-08-02T15:19:31.780

What about something like this?

extension String {
    var isLatin: Bool {
        let upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        let lower = "abcdefghijklmnopqrstuvwxyz"

        for c in self.characters.map({ String($0) }) {
            if !upper.containsString(c) && !lower.containsString(c) {
                return false
            }
        }

        return true
    }

    var isCyrillic: Bool {
        let upper = "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"
        let lower = "абвгдежзийклмнопрстуфхцчшщьюя"

        for c in self.characters.map({ String($0) }) {
            if !upper.containsString(c) && !lower.containsString(c) {
                return false
            }
        }

        return true
    }

    var isBothLatinAndCyrillic: Bool {
        return self.isLatin && self.isCyrillic
    }
}

Usage:

let s = "Hello"
if s.isLatin && !s.isBothLatinAndCyrillic {
    // String is latin
} else if s.isCyrillic && !s.isBothLatinAndCyrillic {
    // String is cyrillic
} else if s.isBothLatinAndCyrillic {
    // String can be either latin or cyrillic
} else {
    // String is not latin nor cyrillic
}

Considere there are cases where the given string could be both, for example the string:

let s = "A"

Can be both latin or cyrillic. So that's why there's the function "is both".

And it can also be none of them:

let s = "*"

This is not a good solution. At least for IOS 11. Please check https://stackoverflow.com/questions/47890747/how-to-detect-text-string-language-in-ios/47890753#47890753 — Eugene P, Feb 28 '18 at 16:30
True. I'v added this for anyone who will look for the similar solution. — Eugene P, Mar 06 '18 at 11:17
This solution is incomplete. Polish "Ś", Slovak "Ŕ", French "Ç" and many other characters with accents are still considered latin. Similar problem exists with cyrillic - you listed only the modern Russian alphabet, but there is also Serbian "Џ", Macedonian "Њ", Tajik "Ғ" and others. — Martin Grey, May 03 '21 at 09:04

Duyen-Hoa · Answer 2 · 2016-08-02T14:55:57.567

You should get all unicode characters and detect if contains cyrillic chars or Latin char based on the unicode value. This code is not complet, you can complete it.

let a : String = "ӿ" //unicode value = 04FF
let scalars = a.unicodeScalars

//get unicode value of first char:
let unicodeValue = scalars[scalars.startIndex].value  //print 1279, correspondant to 04FF.

Check here for all unicode value (in hexa). http://jrgraphix.net/r/Unicode/0400-04FF

According to this site, cyrillic value are from 0400 -> 04FF (1024 -> 1279)

this is the code for cyrillic check:

var isCyrillic = true
for (index, unicode) in scalars.enumerate() {
    if (unicode.value < 1024 || unicode.value > 1279) {
        print("not a cyrillic text")
        print(unicode.value)
        isCyrillic = false
        break
    }
}

Code Different · Answer 3 · 2016-08-02T15:05:18.477

Surprisingly, there's no easy answer to your question. The Latin alphabet contains more than just A - Z. There are accented characters in French and archaic forms in German, etc. I don't know the Cyrillic alphabet so I'll leave it alone. On top of that, you have to deal with: punctuation (.,?"(), etc.) and white space, emojis, arrows, dingbats... which are language neutral. The complexity can escalate very quickly depending on your requirements.

The answer you accepted is inadequate to say the least: "hello world".isLatin == false since it doesn't deal with white spaces.

Visit a site like this one to learn what ranges contain characters for which language and play with the code below. It's not a complete answer but meant to get you started:

let neutralRanges  = [0x20...0x40]
let latinRanges    = [0x41...0x5A, 0x61...0x7A, 0xC0...0xFF, 0x100...0x17F]
let cyrillicRanges = [0x400...0x4FF, 0x500...0x52F]

func scalar(scalar: UnicodeScalar, isInRanges ranges: [Range<Int>]) -> Bool {
    for r in ranges {
        if r ~= Int(scalar.value) {
            return true
        }
    }

    return false
}

let str = "Hello world"
var isLatin = true
var isCyrillic = true

for s in "Hello world".unicodeScalars {
    if scalar(s, isInRanges: neutralRanges) {
        continue
    }
    else if !scalar(s, isInRanges: latinRanges) {
        isLatin = false
    }
    else if !scalar(s, isInRanges: cyrillicRanges) {
        isCyrillic = false
    }
}

print(isLatin)
print(isCyrillic)

You mentioned additional latin characters in different latin languages. The same thing happens in Cyrillic, too - Serbians have "Џ", Macedonians "Њ", Tajik "Ғ", etc. — Martin Grey, May 03 '21 at 09:11

score 1 · Answer 4 · edited Nov 29 '18 at 15:41

A couple of comments refer to another post that shows a fairly clean way to determine the language of a String using NSLinguisticTagger (How to detect text (string) language in iOS? ).

NSLinguisticTagger is definitely the best approach here and is intended exactly for this purpose, but it sounds to me like you're actually asking how to identify the script of the String rather than the language. English, French, German (for example) all use Latin script so the language example above doesn't show the ideal way to discern between Latin and Cyrillic (or other scripts).

Instead I wrote the following extension to String that shows how to identify the script for the first sentence in the String you supply - you can then easily adapt/build on this to get the exact thing you want for your use case:

import Foundation // Needed for NSLinguisticTagger

extension String {
    func scriptCode() -> NSLinguisticTag? {
        let linguisticTagger = NSLinguisticTagger(tagSchemes: [.script], options: 0)

        linguisticTagger.string = self

        return iso15924ScriptCode = linguisticTagger.tag(at: 0, unit: .sentence, scheme: .script, tokenRange: nil)
    }
}

Scripts are uniformly described by four-letter ISO 15924 script codes, such as "Latn", and this is what you get with the returned NSLinguisticTag object. To perform a comparison, just check the raw value of NSLinguisticTag, for example like this:

if yourTestSentence.scriptCode()? == "Latn" || "Cyrl" {
    print("This sentence is in Latin or Cyrillic script")
} else {
    print("Some other script")
}

Caveat: This example only checks the first sentence of whatever string you supply. I haven't tested what happens if that sentence is mixed scripts - most likely the returned tag will be nil.

Here are some useful reference links to Apple's docs, and Wikipedia for more info:

Jovan Ivanov · Answer 5 · 2020-08-13T12:06:14.113

I hope that this also can be useful

 let cyrillicToLatinMap: [Character : String] = [
" ":" ",
"А":"A",
"Б":"B",
"В":"V",
"Г":"G",
"Д":"D",
"Е":"E",
"Ж":"Zh",
"З":"Z",
"И":"I",
"Й":"Y",
"К":"K",
"Л":"L",
"М":"M",
"Н":"N",
"О":"O",
"П":"P",
"Р":"R",
"С":"S",
"Т":"T",
"У":"U",
"Ф":"F",
"Х":"H",
"Ц":"Ts",
"Ч":"Ch",
"Ш":"Sh",
"Щ":"Sht",
"Ъ": "A",
"Ю":"Yu",
"Я":"Ya",
"а":"a",
"б":"b",
"в":"v",
"г":"g",
"д":"d",
"е":"e",
"ж":"zh",
"з":"z",
"и":"i",
"й":"y",
"к":"k",
"л":"l",
"м":"m",
"н":"n",
"о":"o",
"п":"p",
"р":"r",
"с":"s",
"т":"t",
"у":"u",
"ф":"f",
"х":"h",
"ц":"ts",
"ч":"ch",
"ш":"sh",
"щ":"sht",
"ъ": "a",
"ь":"y",
"ю":"yu",
"я":"ya",]

Bulgarian Cyrillic to Latin

 class CyrilicToLatinConverter {

public static func getLatin(wordInCyrillic: String) -> String{
    if(wordInCyrillic.isEmpty) {return wordInCyrillic}
    else{
        let characters = Array(wordInCyrillic)
        var wordInLatin: String = ""
        for n in 0...characters.capacity-1 {
            if isCyrillic(characters: characters[n]) {
                wordInLatin+=cyrillicToLatinMap[characters[n]] ?? ""
            }
            else{
                return ""
            }
        }
        return wordInLatin
    }
}

public static func isCyrillic(characters: Character) -> Bool {
    var isCyrillic: Bool = true;
    for (key,_) in cyrillicToLatinMap{
        isCyrillic = (key == characters)
        if isCyrillic {
            break
        }
    }
    return isCyrillic
}

Mohammad Razipour · Answer 6 · 2017-03-01T04:54:20.990

Swift 3: For Persian and Arabic

extension String {

    var isFarsi: Bool {

        //Remove extra spaces from the first and last word
        let value = self.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)

        if value == "" {
            return false
        }

        let farsiLetters = "آ ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ی ن و ه"
        let arabicLetters = " ء ا أ إ ء ؤ ئـ ئ آ اً ة ا ب ت ث ج ‌ ح خ د ذ ر ز س ‌ ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي"
        for c in value.characters.map({ String($0) }) {
            if !farsiLetters.contains(c) && !arabicLetters.contains(c) {
                return false
            }
        }

        return true
    }      

}

score 0 · Answer 7 · answered Apr 02 '20 at 09:08

swift 5 solution

extension String {
    var isLatin: Bool {
        let upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        let lower = "abcdefghijklmnopqrstuvwxyz"
        for c in self.map({String($0)}) where !upper.contains(c) && !lower.contains(c) {
            return false
        }
        return true
    }
}

Check if string latin or cyrillic

7 Answers7