An HTML file is pleasing to the eye and human-readable, when rendered in a browser, and it's a hell to understand when it's seen raw.
Is it possible to extract text out of an HTML fragment, and convert it to a simple text file, with basic formatting?
I mean a loosy approach. Removing CSS, removing superscripts and subscripts. Only keeping as much information and text and formatting as necessary for a human to understand the new extracted text the way he would understand the original rendered HTML fragment.
P.S: I've tried to use Regular Expressions, to use inclusive approach to only select a few tags, and both soon proved to be impractical as HTML files can get really tricky.
` parts. A `
` could similarly have its `- ` elements on separate lines starting with a dash, etc.
– Andrew Morton Jun 21 '18 at 07:53