I want to save text I scrape from various sources without the HTML tags that are on it, but also keeping as much of the structure as I reasonably can.
Markdown seems to be the solution to this (or possibly MultiMarkdown).
There is a question which offers a suggestion on converting from HTML to markdown, but I want to specify some specific things:
- ALL links (including images) are referenced at the END only (i.e. no inline urls)
- NO embeded HTML (I'm not even 100% sure yet how I'd like to deal with difficult HTML... but it won't be embeded!)
So my question is as stated in the title: Is there a decent, customisable, HTML to Markdown Java API?