I would like to run some XQuery commands using BaseX over an HTML source that may be full of <script>
, <style>
nodes that must be removed and also unclosed tags (<br>
, <img>
) that must have a pair. (for example the dirty source of this page )
"Converting HTML to XML" suggests using Tidy, but it doesn't have a GUI and doesn't seem work correctly on my source (it outputs nothing), and I doubt if it removes scripts and other unnecessary tags. It is very old, by the way.
As I didn't find any question which address my needs, I asked it again. because it is very close to the tools for coding and querying, I asked it here.