0

It's looks good for string but it's not working for me for a word. I am working with search as per as my requirement when user typing any 3 character in the meantime looking to check which language user typing. if I think it should not work with detec0t word but i expect it should be working with Islam word.

let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, .nameType, .lemma], options: 0)

func determineLanguage(for text: String) {
    tagger.string = text
    let language = tagger.dominantLanguage
    print("The language is \(language!)")
}


//Test case
determineLanguage(for: "I love Islam") // en -pass
determineLanguage(for: "আমি ইসলাম ভালোবাসি") // bn -pass
determineLanguage(for: "أنا أحب الإسلام") // ar -pass
determineLanguage(for: "Islam") // und - failed

Result:

The language is en
The language is bn
The language is ar
The language is und

What I missed for "Unknown language"

Nazmul Hasan
  • 10,130
  • 7
  • 50
  • 73

1 Answers1

1

Simply because it belongs to too many languages and it would be unrealistic to guess the language based on one word. The context always helps.

For example :

import NaturalLanguage

let recognizer = NLLanguageRecognizer()
recognizer.processString("Islam")
print(recognizer.dominantLanguage!.rawValue)  //Force unwrapping for brevity

prints tr, which stands for Turkish. It's an educated guess.

If you want the other languages that were also possible, you could use languageHypotheses(withMaximum:):

let hypotheses = recognizer.languageHypotheses(withMaximum: 10)

for (lang, confidence) in hypotheses.sorted(by: { $0.value > $1.value }) {
    print(lang.rawValue, confidence)
}

Which prints

tr 0.2332388460636139   //Turkish
hr 0.1371040642261505   //Croatian
en 0.12280254065990448  //English
pt 0.08051242679357529
de 0.06824589520692825
nl 0.05405258387327194
nb 0.050924140959978104
it 0.037797268480062485
pl 0.03097432479262352
hu 0.0288708433508873

Now you could define an acceptable threshold of confidence in order to accept that result.


Language codes can be found here

ielyamani
  • 17,807
  • 10
  • 55
  • 90
  • Would you tell me please? Can I set the country code before Which may be in the middle of the language. for example `Islam` word into 3 language **English, Arabic, Bangla** – Nazmul Hasan May 25 '19 at 03:54
  • 1
    @NazmulHasan you could compare the confidence associated with those languages in the `hypotheses`. In the example above, English would be the highest, Arabic and Bangla would be `nil`. If it’s still not clear enough, you could ask a new question and feel free to tag me. Ramadan kareem! – ielyamani May 25 '19 at 10:41
  • Ramadan Kareem! – Nazmul Hasan May 25 '19 at 15:09