0

I'm currently doing work with people using speech recognition and detecting names in speech. This works well however I'm having issues with names. I'm in Wales and many people around have Welsh names (including me). I have a CSV of all the Welsh Names. Some names are also being picked up as Places (like Osian) Is there a way to extend the NSLinguisticTagger to include Welsh Names? Or is there a way of detecting the Welsh name?

Here is my current code:

let text = "Hey I'm Osian"

// 2
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .organizationName, .placeName]

// 3
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }
    return true
}
matt
  • 515,959
  • 87
  • 875
  • 1,141
Osian
  • 171
  • 1
  • 14
  • Isn’t that the whole point of the new Natural Language framework? You can teach the parser with machine learning. See for example this tutorial. https://heartbeat.fritz.ai/natural-language-in-ios-12-customizing-tag-schemes-and-named-entity-recognition-caf2da388a9f – matt Apr 09 '19 at 17:46
  • @matt but won’t I also have to give it a dataset of non names? So the rest of the dictionary as such? – Osian Apr 09 '19 at 21:14

1 Answers1

0

Normally you need to set the dominant language. But it appears that Welsh is not supported. See: https://developer.apple.com/documentation/naturallanguage/nllanguage?language=objc

My guess is best approach is to set dominant language a close as possible, then train a model as discussed by others.

jz_
  • 338
  • 2
  • 14