0

we have to get out all the Text from an HTML File without the usage of Jsoup or similar. Whats the best/only way to do that? Our Example looks like this:

<ul><li>Coffee</li><li>Tea</li><li>Milk</li></ul>  
<h2>An Ordered HTML List</h2>
<ol><li>Coffee</li><li>Tea</li><li>Milk</li></ol>´´´

need to get all the text out of these html tags without using any libs and if the Tag is not done correctly, print out an error message. Need help guys
  • 1
    Do you mean "without a library" or "without an external library"? In other words, are you required to do this yourself or are you just required to introduce no new dependencies? If it is the latter, Java has a built-in XML library, that may be appropriate, if the HTML file is well-formed XML. – Izruo Dec 13 '21 at 11:42
  • We are not allowed to use external librarys. Internal ones should be fine. – ItPlayzzzDaveYT Dec 13 '21 at 11:54
  • Duplicate. Basically just need a regex to remove the tags. [Java regex to strip out XML tags, but not tag contents](https://stackoverflow.com/questions/15769028/java-regex-to-strip-out-xml-tags-but-not-tag-contents) – magicmn Dec 13 '21 at 12:02
  • @magicmn I'm just thinking of [HTML can't be parsed by regex](https://stackoverflow.com/a/1732454/7525132). Since as of now we have very little information on what OP is actually trying to do, using regex might be either perfectly valid or a deeply flawed approach. – Izruo Dec 13 '21 at 12:09
  • Regex parsing only works in php because some regex are not avaiable in java but I would need em – ItPlayzzzDaveYT Dec 13 '21 at 12:55
  • @Izruo Yes HTML shouldn't be parsed using regex, but from his question I understood, that all he wants is the text inside the tags. And that is doable with regex. – magicmn Dec 13 '21 at 13:19
  • 1
    @ItPlayzzzDaveYT I don't know much about Php, but why wouldn't you be able to use basic features of Regex in java? – magicmn Dec 13 '21 at 13:23

0 Answers0