0

Trying to check spelling whether it is correct or misspelled using WordNet. Here's the implementation SpellChecker.java done by me so far...

package com.domain.wordnet;

import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Collection;

import net.didion.jwnl.JWNL;
import net.didion.jwnl.JWNLException;
import net.didion.jwnl.data.IndexWord;
import net.didion.jwnl.data.IndexWordSet;
import net.didion.jwnl.data.Synset;
import net.didion.jwnl.dictionary.Dictionary;

public class SpellChecker {

    private static Dictionary dictionary = null;
    private static final String PROPS = "/opt/jwnl/jwnl14-rc2/config/file_properties.xml";

    static {
        try(InputStream is = new FileInputStream(PROPS)) {
            JWNL.initialize(is);
            dictionary = Dictionary.getInstance();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        System.out.println(isCorrect("change"));    //  true
        System.out.println(isCorrect("changes"));   //  false
        System.out.println(isCorrect("changed"));   //  true
        System.out.println(isCorrect("changing"));  //  true
        System.out.println();
        System.out.println(isCorrect("analyze"));   //  true
        System.out.println(isCorrect("analyzed"));  //  true
        System.out.println(isCorrect("analyzing")); //  false
    }

    public static boolean isCorrect(String token) {
        try {
            token = token.trim().toLowerCase();
            IndexWordSet set = dictionary.lookupAllIndexWords(token);
            if(set == null)
                return false;

            @SuppressWarnings("unchecked")
            Collection<IndexWord> collection = set.getIndexWordCollection();
            if(collection == null || collection.isEmpty())
                return false;

            for(IndexWord word : collection) {
                Synset[] senses = word.getSenses();
                if(senses != null && senses.length > 0
                        && senses[0].toString().toLowerCase().contains(token)) {
                    return true;
                }
            }

            return false;
        } catch (JWNLException e) {
            e.printStackTrace();
            return false;
        }
    }
}

It is quite fine in most of the cases but you can see getting failed with plural and some ing forms. Can I avoid plural and ing forms anyhow without spoiling English language rules?

If you see, in the WordNet Browser, changes is a valid word, but in Java APIs, it is not valid.

enter image description here

Don't know where I need to correct! Or any other good approach to overcome this issue?

User
  • 4,023
  • 4
  • 37
  • 63
  • `isCorrect("analying")` returning false seems entirely correct as `analying` is not the correct word here as far as I am aware. `analyzing` would be. – Ben Jul 05 '19 at 06:25
  • Hey @Ben, My bad! I have corrected my own spelling mistake.. :( But still it is false for _analyzing_ – User Jul 05 '19 at 06:29
  • try some nlp libraries? – Kris Jul 05 '19 at 06:58
  • @Kris Definitely I will go for other NLP solutions, but first I want to make my job done using WordNet only, because it is being used in the same project already. – User Jul 05 '19 at 08:42

1 Answers1

1

The mistake you do here is in this loop

for(IndexWord word : collection) {
                Synset[] senses = word.getSenses();
                if(senses != null && senses.length > 0
                        && senses[0].toString().toLowerCase().contains(token)) {
                    return true;
                }
            }

The line Synset[] senses = word.getSenses() returns all senses of the word, but you are checking only the first one (0-index). The word will be available in one of the senses. Something like this

for (IndexWord word : collection) {

            Synset[] senses = word.getSenses();
            for(Synset sense:senses){
                if(sense.getGloss().toLowerCase().contains(token)){return true;}
            }

        }

Adding on to this, the ing forms of words may not be available as senses. I'm not sure why you want to search for the senses to decide its a valid word.

A code like if(set.getLemma() != null) return true;

should be enough to decide the spell check right?

Kris
  • 8,680
  • 4
  • 39
  • 67
  • Yeah, I am just using the implementation written by this guy on following link.. https://stackoverflow.com/a/34051675/4306260 – User Jul 06 '19 at 16:04