0

So i'm working a an app that can patch words that are broken.

Lets take:

mny people say there is a error in this sentence

With swift here we can us UITextChecker and get a wonderful result of what the word mny could actually be... However, i actually get a couple of choices, one of which is many and among the other you have money so obviously money wouldn't fit in very well in this sentence. Are there any way to check if the sentence itself is logical?

Aleksey Potapov
  • 3,683
  • 5
  • 42
  • 65
Vollan
  • 1,887
  • 11
  • 26
  • No reason for down vote or "close" nice.. thank you – Vollan Nov 13 '19 at 09:20
  • There is no built-in way to check for the correctness of the sentence. You need either to create that logic by yourself or use third-party APIs like https://www.grammarbot.io/ – Sahil Manchanda Nov 25 '19 at 09:20
  • The question you asked is quite non-trivial. Check this open-source project https://github.com/languagetool-org/languagetool – Aleksey Potapov Nov 25 '19 at 10:32
  • @SahilManchanda @AlekseyPotapov Thank you both. I've looked around some in the swift language both at `UITextChecker` and the `NaturalLanguageToolkit` and thought you could work some magic with that. But it appears as if it's not possible then. – Vollan Nov 25 '19 at 10:35
  • It is still possible. While thinking about your question I was curious to browse internet and I came to the idea, that when you use UITextChecker along with NLTagger all I could get is something like this: misspell: mny suggestions: Optional(["may", "any", "my", "many"]) mny: Adjective people: Noun say: Verb there: Pronoun is: Verb an: Determiner error: Noun in: Preposition this: Determiner sentence: Noun ---- may: Noun any: Determiner my: OtherWord many: Adjective – Aleksey Potapov Nov 25 '19 at 10:45
  • and after this you could check the word order of english language. But that is wrong. You know that in your case you could have sentence "my mother says there is a error in this sentence", and here "My" is Determiner. But if you were having "Mny" - it is an Adjective in a sentence, but Otherword (My) due to NLTagging. So I don't see another approach but to find [an online tool](https://stackoverflow.com/questions/36856145/what-are-the-spell-correct-apis-available) or [write your own](https://machinelearnings.co/deep-spelling-9ffef96a24f6) or try dig the Core ML – Aleksey Potapov Nov 25 '19 at 10:52
  • here is [Swift 3 (outdated) gist](https://gist.github.com/sergeytimoshin/ae2b7152ac425a8de1a1d2b47b0b27ce) of famous [Python example](http://norvig.com/spell-correct.html) – Aleksey Potapov Nov 25 '19 at 10:54
  • @AlekseyPotapov Thank you. Yeah i was thinking of what you mentioned first. And it works somewhat. Because you have many people which then creates adverb people, so i would have had to check both adverbs and adjectives to come to the solution. But feel free to write your last one as an answer and if it works (will try later) i will mark it the right one. – Vollan Nov 25 '19 at 10:58

1 Answers1

1

Consider that this still needs to be improved. I updated this swift 3 solution to Swift 5. Worth to mention that it was originally inspired by this python tutorial

Create a new iOS project, add there a text file named bigtext.txt which will contain this text. This will be our "learning" dictionary. Then in ViewController:

import UIKit
import NaturalLanguage

class ViewController: UIViewController {

    override func viewDidLoad() {
        super.viewDidLoad()

        let inputString = "mny people say there is a error in this sentence"
        var newString = inputString

        // Read a text file and "study" the model
        guard let path = Bundle.main.path(forResource: "bigtext", ofType: "txt") else {
            print("Path not available")
            return
        }
        let checker = SpellChecker(contentsOfFile: path)

        // better to use this to iterate between words in a sentence
        let tokenizer = NLTokenizer(unit: .word)
        tokenizer.string = inputString
        tokenizer.enumerateTokens(in: inputString.startIndex..<inputString.endIndex) { tokenRange, _ in
            let word = String(inputString[tokenRange])
            let checked = checker?.correct(word: word)
            let candidates = checker?.candidates(word: word)

            if word == checked {
                print("\(word) unchanged")
            } else {
                if let checked = checked {
                    newString.replaceSubrange(tokenRange, with: checked)
                }
                print("Correct:\t\(word) -> \(String(describing: checked))")
                print("Candidates:\t\(word) -> \(String(describing: candidates))")
            }
            return true
        }
        print("Result: \(newString)")
    }
}

func edits(word: String) -> Set<String> {
    if word.isEmpty { return [] }

    let splits = word.indices.map {
        (word[word.startIndex..<$0], word[$0..<word.endIndex])
    }

    let deletes = splits.map { $0.0 +  String($0.1.dropFirst()) }

    let transposes: [String] = splits.map { left, right in
        if let fst = right.first {
            let drop1 = String(right.dropFirst())
            if let snd = drop1.first {
                let drop2 = String(drop1.dropFirst())
                return "\(left)\(snd)\(fst)\(drop2)"
            }
        }
        return ""
    }.filter { !$0.isEmpty }

    let alphabet = "abcdefghijklmnopqrstuvwxyz"

    let replaces = splits.flatMap { left, right in
        alphabet.map { "\(left)\($0)\(String(right.dropFirst()))" }
    }

    let inserts = splits.flatMap { left, right in
        alphabet.map { "\(left)\($0)\(right)" }
    }
    let setString = [String(deletes.first!)] + transposes + replaces + inserts
    return Set(setString)
}

struct SpellChecker {

    var knownWords: [String:Int] = [:]

    mutating func train(word: String) {
        if let idx = knownWords[word] {
            knownWords[word] = idx + 1
        }
        else {
            knownWords[word] = 1
        }
    }

    init?(contentsOfFile file: String) {
        do {
            let text = try String(contentsOfFile: file, encoding: .utf8).lowercased()
            let words = text.unicodeScalars.split(whereSeparator: { !("a"..."z").contains($0) }).map { String($0) }
            for word in words { self.train(word: word) }
        }
        catch {
            return nil
        }
    }

    func knownEdits2(word: String) -> Set<String>? {
        var known_edits: Set<String> = []
        for edit in edits(word: word) {
            if let k = known(words: edits(word: edit)) {
                known_edits.formUnion(k)
            }
        }
        return known_edits.isEmpty ? nil : known_edits
    }

    func known<S: Sequence>(words: S) -> Set<String>? where S.Iterator.Element == String {
        let s = Set(words.filter { self.knownWords.index(forKey: $0) != nil })
        return s.isEmpty ? nil : s
    }

    func candidates(word: String) -> Set<String> {
        guard let result = known(words: [word]) ?? known(words: edits(word: word)) ?? knownEdits2(word: word) else {
            return Set<String>()
        }

        return result
    }

    func correct(word: String) -> String {
        return candidates(word: word).reduce(word) {
            (knownWords[$0] ?? 1) < (knownWords[$1] ?? 1) ? $1 : $0
        }
    }
}

Will output you:

Correct:    mny -> Optional("may")
Candidates: mny -> Optional(Set(["any", "ny", "may", "many"]))
people unchanged
say unchanged
there unchanged
is unchanged
a unchanged
error unchanged
in unchanged
this unchanged
sentence unchanged
Result: may people say there is a error in this sentence

Please, consider that we took first correction candidate. Need first to clarify ourselves the word order and understand the sentence context.

Aleksey Potapov
  • 3,683
  • 5
  • 42
  • 65
  • Right, so both many and may suits this one. and this one will select may? I guess "may" kinda creates a question? but i guess at the same time, it's as close as i've ever get.. – Vollan Nov 25 '19 at 13:36
  • Seems like both here are valid. This one could offer candidates, and with them you could proceed. I think, the model could be "trained" using another type of text. – Aleksey Potapov Nov 25 '19 at 14:10