I am writing an HTML to Markdown converter in Rust, using Kuchiki to get access to the parsed tree from html5ever.
For unknown HTML tags, I want to provide the possibility to ignore them and pass them through to the output string, but still processing their children as normal. For that, I need the textual representation of the tag without its contents, but I can't figure how best to do that.
The best I can come up with is:
- Clone the node
- Drop its children
- Call
node.to_string
- "parse" the string with a regular expression to separate the opening and closing tags.
I feel there must be a better way. I don't think Kuchiki provides this functionality out of the box, but I also don't know how to get access to the html5ever API through Kuchiki, and I also don't get from the html5ever API documentation whether they would provide some functionality like this.