You won't be able to validate an XML document with non-well-formed HTML in it, since on account of the non-wellformedness such documents are not XML documents. But if in fact the input you're getting is XML, then you can certainly define data
to allow any well-formed HTML elements, or any well-formed XML.
Allowing any well-formed XML is the simplest. We define a pattern than means "any well-formed XML here": any elements encountered are validated using the same pattern, recursively:
wellformed-xml = (text
| element * { wellformed-xml }
)*
Now define the data
element to use that pattern:
stuff = element stuff {
element data { wellformed-xml }
}
If you really want to ensure that it's just HTML, you'll want a nameclass more restrictive than "*". I've populated it with b
, i
, p
, span
, and div
, and leave it as an exercise to you to add the other elements you want.
start = stuff
stuff =
element stuff {
element data { wellformed-html }
}
wellformed-html =
(text
| element b | div | i | p | span { wellformed-html }
)*
If you want to be able to support XHTML input as well, you'll want to use a namespace reference; again, an exercise for the reader.