1

I was wondering if there was a way to convert HTML to sentences, for example, using JSoup. What I am looking for is something like:

List<String> convertToSentences(String html);

Sometimes sentences are separated by dots, question and exclamation marks, and sometimes by HTML structures, like <ul>'s and <p>'s

For example, given the following HTML:

<p>Hello World. What a great day.</p>    // [Hello world, What a great day]
<ul><li>One</li><li>Two</li></ul>        // [One, Two]
<p>Today is <strong>great</great></p>    // [Today is great]

Is there any library out there which does such a thing?

Erik Pragt
  • 13,513
  • 11
  • 58
  • 64

0 Answers0