I have gold data where I annotated all room numbers from several documents. I want to use openNLP to train a model that uses this data and classify room numbers. I am stuck on where to start. I read openNLP maxent documentation, looked at examples in opennlp.tools and now looking at opennlp.tools.ml.maxent - it seems like it is something what I should be using, but still I have no idea on how to use. Can somebody give me some basic idea on how to use openNLP maxent and where to start with? Any help will be appreciated.
Asked
Active
Viewed 3,141 times
3
-
this post should help http://stackoverflow.com/questions/24381095/writing-our-own-models-in-opennlp/24406829#24406829 – Mark Giaconia Jul 23 '14 at 15:17
-
Thank you. Could you also tell me how to convert eHOST annotations into openNLP format? – user2788945 Jul 23 '14 at 15:30
-
sorry, I don't know... never worked with ehost. But my instinct tells me if the ehost format is parsable with regex or something, you should be able to convert those tags to opennlp tags – Mark Giaconia Jul 23 '14 at 19:58
1 Answers
4
This is a minimal working example that demonstrates the usage of OpenNLP Maxent API.
It includes the following:
- Training a maxent model from data stored in a file.
- Storing the trained model into a file.
- Loading the trained model from a file.
- Using the model for classification.
- NOTE: the outcome is the first element in each training sample
- NOTE: the values can be arbitrary strings, e.g.
xyz=s0methIng
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.GZIPInputStream;
import opennlp.maxent.GIS;
import opennlp.maxent.io.GISModelReader;
import opennlp.maxent.io.SuffixSensitiveGISModelWriter;
import opennlp.model.AbstractModel;
import opennlp.model.AbstractModelWriter;
import opennlp.model.DataIndexer;
import opennlp.model.DataReader;
import opennlp.model.FileEventStream;
import opennlp.model.MaxentModel;
import opennlp.model.OnePassDataIndexer;
import opennlp.model.PlainTextFileDataReader;
...
String trainingFileName = "training-file.txt";
String modelFileName = "trained-model.maxent.gz";
// Training a model from data stored in a file.
// The training file contains one training sample per line.
// Outcome (result) is the first element on each line.
// Example:
// result=1 a=1 b=1
// result=0 a=0 b=1
// ...
DataIndexer indexer = new OnePassDataIndexer( new FileEventStream(trainingFileName));
MaxentModel trainedMaxentModel = GIS.trainModel(100, indexer); // 100 iterations
// Storing the trained model into a file for later use (gzipped)
File outFile = new File(modelFileName);
AbstractModelWriter writer = new SuffixSensitiveGISModelWriter((AbstractModel) trainedMaxentModel, outFile);
writer.persist();
// Loading the gzipped model from a file
FileInputStream inputStream = new FileInputStream(modelFileName);
InputStream decodedInputStream = new GZIPInputStream(inputStream);
DataReader modelReader = new PlainTextFileDataReader(decodedInputStream);
MaxentModel loadedMaxentModel = new GISModelReader(modelReader).getModel();
// Now predicting the outcome using the loaded model
String[] context = {"a=1", "b=0"};
double[] outcomeProbs = loadedMaxentModel.eval(context);
String outcome = loadedMaxentModel.getBestOutcome(outcomeProbs);

Viliam Simko
- 1,711
- 17
- 31
-
Can you provide the detail implemeted example? along with the training data sets and models. also can you help on this question http://stackoverflow.com/questions/36032808/creating-training-data-for-a-maxent-classfier-in-java – Ankit Bansal Mar 22 '16 at 11:44
-
The OpenNLP Maxent classifier expects a textual input file which is formatted as follows: (1) single sample per line, (2) each feature separated by a space, and (3) written as "key=value". See my example inside the code snippet "// Example: ..." – Viliam Simko Mar 23 '16 at 15:18
-
@ViliamSimko Do `value` and `result` have to be numbers or they can be any string (without space) as well ? – Sheng Jul 14 '17 at 21:19
-
@Sheng The values can be arbitrary strings. I once used a sample like this: `wprefix:4=borr sppos=NNP pos=:` – Viliam Simko Jul 17 '17 at 08:14
-
@ViliamSimko thank you ! Are you able to answer another question of me ? https://stackoverflow.com/questions/45149041/training-opennlp-ner-model-on-features – Sheng Jul 17 '17 at 16:17