Using the OpenNLP doccat api, you can create training data and then a model from the training data. The advantage of this over something like a naive bayes classifier is that it returns a probability distribution over your set of categories.
so if you create a file with this format:
customerserviceproblems They did not respond
customerserviceproblems They didn't respond
customerserviceproblems They didn't respond at all
customerserviceproblems They did not respond at all
customerserviceproblems I received no response from the website
customerserviceproblems I did not receive response from the website
etc.... provide as many samples as possible and make sure each line ends with a \n newline
using this appoach you can add anything you want that means "customer service problems" and you can also add any other categories as well, so you don't have to be too deterministic about what data falls into what categories
here is what the java looks like to build the model
DoccatModel model = null;
InputStream dataIn = new FileInputStream(yourFileOfSamplesLikeAbove);
try {
ObjectStream<String> lineStream =
new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
model = DocumentCategorizerME.train("en", sampleStream);
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(modelOutFile));
model.serialize(modelOut);
System.out.println("Model complete!");
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
}
Once you have the model, you can then use it something like this:
DocumentCategorizerME documentCategorizerME;
DoccatModel doccatModel;
doccatModel = new DoccatModel(new File(pathToModelYouJustMade));
documentCategorizerME = new DocumentCategorizerME(doccatModel);
/**
* returns a map of a category to a score
* @param text
* @return
* @throws Exception
*/
private Map<String, Double> getScore(String text) throws Exception {
Map<String, Double> scoreMap = new HashMap<>();
double[] categorize = documentCategorizerME.categorize(text);
int catSize = documentCategorizerME.getNumberOfCategories();
for (int i = 0; i < catSize; i++) {
String category = documentCategorizerME.getCategory(i);
scoreMap.put(category, categorize[documentCategorizerME.getIndex(category)]);
}
return scoreMap;
}
then in the returned hashmap you have each category that you modeled and a score, you can use the scores to decide which category the input text belongs to.