I was wondering if there was a way to convert HTML to sentences, for example, using JSoup. What I am looking for is something like:
List<String> convertToSentences(String html);
Sometimes sentences are separated by dots, question and exclamation marks, and sometimes by HTML structures, like <ul>
's and <p>
's
For example, given the following HTML:
<p>Hello World. What a great day.</p> // [Hello world, What a great day]
<ul><li>One</li><li>Two</li></ul> // [One, Two]
<p>Today is <strong>great</great></p> // [Today is great]
Is there any library out there which does such a thing?