1

I am trying to train en-ner-location.bin file using opennlp in java The thing is i got the training text file in the following format <START:location> Fontana <END> <START:location> Palo Verde <END> <START:location> Picacho <END>

and i trained the file using the following code

import java.io.BufferedOutputStream;
  import java.io.BufferedReader;
  import java.io.File;
  import java.io.FileInputStream;
  import java.io.FileOutputStream;
  import java.io.FileReader;
  import java.io.IOException;
  import java.io.InputStream;
  import java.nio.charset.Charset;
  import java.util.Collections;
  import opennlp.tools.namefind.NameFinderME;
  import opennlp.tools.namefind.NameSample;
  import opennlp.tools.namefind.NameSampleDataStream;
  import opennlp.tools.namefind.TokenNameFinderModel;
  import opennlp.tools.tokenize.Tokenizer;
  import opennlp.tools.tokenize.TokenizerME;
  import opennlp.tools.tokenize.TokenizerModel;
  import opennlp.tools.util.ObjectStream;
  import opennlp.tools.util.PlainTextByLineStream;
  import opennlp.tools.util.Span;

  public class TrainNames {   
@SuppressWarnings("deprecation")
public void TrainNames() throws IOException{
    File fileTrainer=new File("citytrain.txt");
    File output=new File("en-ner-location.bin");
    ObjectStream<String> lineStream = new PlainTextByLineStream(new    FileInputStream(fileTrainer), "UTF-8");
    ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
    System.out.println("lineStream = " + lineStream);
    TokenNameFinderModel model = NameFinderME.train("en", "location", sampleStream, Collections.<String, Object>emptyMap(), 1, 0);

    BufferedOutputStream modelOut = null;
    try {
        modelOut = new BufferedOutputStream(new FileOutputStream(output));
        model.serialize(modelOut);
    } finally {
        if (modelOut != null)
            modelOut.close();
    }
}
  }

I got no errors or warnings but when i try to get a city name from a string like this cnt="John is planning to specialize in Electrical Engineering in UC Fontana and pursue a career with IBM."; It returns the whole string anybody could tell me why...??

1 Answers1

0

Welcome to SO! Looks like you need more context around each location annotation. I believe right now openNLP thinks you are training it to find words (any word) because your training data has only one word. You need to annotate locations within whole sentences and you will need at least a few hundred samples to start seeing good results.

See this answer as well: How I train an Named Entity Recognizer identifier in OpenNLP?

Community
  • 1
  • 1
Mark Giaconia
  • 3,844
  • 5
  • 20
  • 42
  • hi i changed the training file as you suggested and i included 100 sentences containng the city name and tagged them but also it did not work....where do u think i did wrong – user3649086 May 31 '14 at 07:37
  • try changing your call to .train above to this: TokenNameFinderModel model = NameFinderME.train("en", "location", sampleStream,null); – Mark Giaconia May 31 '14 at 19:09