2

I have a pretty easy question regarding WordNet and MIT JWI (Java API for accessing WordNet): I read a file into an array of strings, which I've split into words. How can I get a separate array of strings containing only the nouns using getPOS()? Thanks!

Example of what I've tried:

public class test {

public static void main(String[] args) {

    String sentence1 = "The cat ate the fish";

    String[] s1Split = sentence1.split(" ");

    String wnhome = "C:/Program Files/WordNet/2.1";
    String path = wnhome + File.separator + "dict";
    URL url = new URL("file", null , path); 
    IDictionary dict = new Dictionary(url);
    dict.open();


    for (int i = 0; i <s1.length; i++) {
                    //this is where I got confused, wanted to use something like:
                    //Word w = dict.getIndexWord(s1[i], ..) but I need a POS argument, 
                    //and I can't find another suitable method
                    //if w.getPOS() is a noun I would add it to a separate vector
    }

}

}

EDIT: Just thought of another one - would it be reliable to use something like w = dict.getIndexWord(s1[i], POS.NOUN), and if a noun doesn't exists, w will be null? Would this be something worth trying?

EDIT2: So my question atm would be if there's any way I can transform a string (word) into a Wordnet object, so I can use getPOS() on it?

demongolem
  • 9,474
  • 36
  • 90
  • 105
user573382
  • 343
  • 3
  • 10
  • 22

2 Answers2

1

Your approach isn't going to work as well as it could if you use another library - WordNet is designed as a 'dictionary/thesaurus' on steroids not a parser. The Stanford Parser is a good place to look for an alternative.

That said, you can perform lookup on each word, but if there are words that are both nouns and, say verbs, you won't be able to distinguish because you're not considering syntax.

This should get you started (see the example on the bottom). Do the lookup for a noun, if it doesn't come back, discard it.

dfb
  • 13,133
  • 2
  • 31
  • 52
0

For JWNL it works as the following, don't know if it is the same though.

If I have understood your problem is getting the POS (part of speech tags). To do this you must use another tool such as Stanford Pos Tagger. However in this way you get a string for each word string hence you must convert from the POS in string format to the POS in the POS class of JWNL.

roschach
  • 8,390
  • 14
  • 74
  • 124