The task: taking HTML page and keeping only text from it with formatting available for simple text: so if there was <br>
tag I'd like to convert it to /r/n, if there was a table - I'd like to keep the initial structure of this table in the resulting text and so on.
There are built-in PHP function strip_tags()
which is not really fits my requirements as it will keep the contents of styles and scripts and will not keep the formatting deleting <br>
, <table>
and other tags.
I also have read the stack question 'strip html,css from string' but there's no answer I'm looking for.
Essentially I'm looking for a way to render an HTML page to TXT file (with no links and images). Is it possible? Is there any libraries doing this thing?