Since the void elements of HTML can not be nested as per definition of void, it seems safe to me to process this HTML subset using regular expressions.
So, for example, I could add a slash before some closing angle brackets, to enable for processing HTML with XML tools:
s/<((?:area|base|br|col|embed|hr|img|input|link|meta|param|source|track|wbr)\b[^/]*?)>/<\1/>/
Is this assumption correct?