I am trying to write an article summarizer for HTML pages. So far I have used boilerpipe and classifier4J.
//url can be any url in String
public String getArticleSummaryFromUrl() {
private Document doc = Jsoup.connect(url).get();;
String summary = "";
String article = "";
try {
article = ArticleExtractor.INSTANCE.getText(doc.html());
System.out.println("Article ++++ >>" + article);
SimpleSummariser ss = new SimpleSummariser();
summary = ss.summarise(article, 4);
} catch (BoilerpipeProcessingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return summary;
}
But most of the time the code does not produce desired results, as the sentence construction is not properly done.
I am trying to implement something as neat as http://smmry.com/.
Does any one know any java library that does this for you?