5

I use LanguageTool for some spellchecking and spell correction functionality in my application.

The LanguageTool documentation describes how to exclude words from spell checking (with call the addIgnoreTokens(...) method of the spell checking rule you're using).

How do you add some words (e.g., from a specific dictionary) to spell checking? That is, can LanguageTool fix words with misspellings and suggest words from my specific dictionary?

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
Simplex
  • 1,723
  • 2
  • 17
  • 26

2 Answers2

6

Unfortunately, the API doesn't support this I think. Without the API, you can add words to spelling.txt to get them accepted and used as suggestions. With the API, you might need to extend MorfologikSpellerRule and change this place of the code. (Disclosure: I'm the maintainer of LanguageTool)

Daniel Naber
  • 1,594
  • 12
  • 19
  • Thanks a lot! As I understand, if I'll add words to spelling.txt then it will cause add these words to suggesting list on initialization only? And if I add words in running app, I need in extend MorfologikSpellerRule (and change code)? – Simplex Dec 13 '16 at 21:40
  • Yes, that's how it is. – Daniel Naber Dec 14 '16 at 08:07
  • I have create one class and extend MorfologikSpellerRule but when i check in suggestion list, words not coming as per my external provided list. – Maulik Patel Sep 06 '17 at 13:19
0

I have similar requirement, which is load some custom words into dictionary as "suggest words", not just "ignored words". And finally I extend MorfologikSpellerRule to do this:

  • Create class MorfologikSpellerRuleEx extends from MorfologikSpellerRule, override the method "match()", and write my own "initSpeller()" for creating spellers.
  • And then for the language tool, create this custom speller rule to replace existing one.

Code:

Language lang = new AmericanEnglish();
JLanguageTool langTool = new JLanguageTool(lang);
langTool.disableRule("MORFOLOGIK_RULE_EN_US");

try {
    MorfologikSpellerRuleEx spellingRule = new MorfologikSpellerRuleEx(JLanguageTool.getMessageBundle(), lang);
    spellingRule.setSpellingFilePath(spellingFilePath);
        //spellingFilePath is the file has my own words + words from /hunspell/spelling_en-US.txt
    langTool.addRule(spellingRule);

} catch (IOException e) {
    e.printStackTrace();
}

The code of my custom MorfologikSpellerRuleEx:

public class MorfologikSpellerRuleEx extends MorfologikSpellerRule {

private String spellingFilePath = null;
private boolean ignoreTaggedWords = false;

public MorfologikSpellerRuleEx(ResourceBundle messages, Language language) throws IOException {
    super(messages, language);
}

@Override
public String getFileName() {
    return "/en/hunspell/en_US.dict";
}

@Override
public String getId() {
    return "MORFOLOGIK_SPELLING_RULE_EX";
}

@Override
public void setIgnoreTaggedWords() {
    ignoreTaggedWords = true;
}

public String getSpellingFilePath() {
    return spellingFilePath;
}

public void setSpellingFilePath(String spellingFilePath) {
    this.spellingFilePath = spellingFilePath;
}

private void initSpellerEx(String binaryDict) throws IOException {
    String plainTextDict = null;
    if (JLanguageTool.getDataBroker().resourceExists(getSpellingFileName())) {
        plainTextDict = getSpellingFileName();
    }
    if (plainTextDict != null) {

        BufferedReader br = null;
        if (this.spellingFilePath != null) {
            try {
                br = new BufferedReader(new FileReader(this.spellingFilePath));
            }
            catch (Exception e) {
                br = null;
            }
        }

        if (br != null) {
            speller1 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 1);
            speller2 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 2);
            speller3 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 3);

            br.close();
        }
        else {
            speller1 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 1);
            speller2 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 2);
            speller3 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 3);
        }

        setConvertsCase(speller1.convertsCase());
    } else {
        throw new RuntimeException("Could not find ignore spell file in path: " + getSpellingFileName());
    }
}

private boolean canBeIgnored(AnalyzedTokenReadings[] tokens, int idx, AnalyzedTokenReadings token)
        throws IOException {
    return token.isSentenceStart() || token.isImmunized() || token.isIgnoredBySpeller() || isUrl(token.getToken())
            || isEMail(token.getToken()) || (ignoreTaggedWords && token.isTagged()) || ignoreToken(tokens, idx);
}   

@Override
public RuleMatch[] match(AnalyzedSentence sentence) throws IOException {
    List<RuleMatch> ruleMatches = new ArrayList<>();
    AnalyzedTokenReadings[] tokens = getSentenceWithImmunization(sentence).getTokensWithoutWhitespace();
    // lazy init
    if (speller1 == null) {
        String binaryDict = null;
        if (JLanguageTool.getDataBroker().resourceExists(getFileName())) {
            binaryDict = getFileName();
        }
        if (binaryDict != null) {
            initSpellerEx(binaryDict);  //here's the change
        } else {
            // should not happen, as we only configure this rule (or rather its subclasses)
            // when we have the resources:
            return toRuleMatchArray(ruleMatches);
        }
    }
    int idx = -1;
    for (AnalyzedTokenReadings token : tokens) {
        idx++;
        if (canBeIgnored(tokens, idx, token)) {
            continue;
        }
        // if we use token.getToken() we'll get ignored characters inside and speller
        // will choke
        String word = token.getAnalyzedToken(0).getToken();
        if (tokenizingPattern() == null) {
            ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
        } else {
            int index = 0;
            Matcher m = tokenizingPattern().matcher(word);
            while (m.find()) {
                String match = word.subSequence(index, m.start()).toString();
                ruleMatches.addAll(getRuleMatches(match, token.getStartPos() + index, sentence));
                index = m.end();
            }
            if (index == 0) { // tokenizing char not found
                ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
            } else {
                ruleMatches.addAll(getRuleMatches(word.subSequence(index, word.length()).toString(),
                        token.getStartPos() + index, sentence));
            }
        }
    }
    return toRuleMatchArray(ruleMatches);
}   

}

chenlao
  • 61
  • 1
  • 2