Yes, it is valid XML. To parse it, you have to understand that XML is represented as a tree of nodes. That XML would parse into the following tree structure.
p
|_ attributes
| |_ "class"="leaders"
|
|_ children
|_ #text "Todd"
|
|_ span
| |_ attributes
| | |_ "class"="leader-type"
| |
| |_ children
| |_ #text "."
|
|_ #text "R"
|
|_ span
| |_ attributes
| | |_ "class"="leader-type"
| |
| |_ children
| |_ #text "."
|
|_ #text "Colas"
Each attribute and child node is represents as a separate IXMLNode
interface in the TXMLDocument
. As you can see, the plain text portions are separated into their own #text
nodes.
Once you have loaded the XML into TXMLDocument
, the TXMLDocument.DocumentElement
property represents the <p>
node. That node's AttributeNodes
property contains a "class" node, and its ChildNodes
property contains the first level of #text
and <span>
nodes. The <span>
nodes have their own AttributeNodes
and ChildNodes
lists, and so on. So to parse this, you would iterate through the tree looking for the #text
nodes, using the <span>
nodes to manipulate the text as needed.
To create such a document, you simply create the individual nodes as needed, eg:
Doc.Active := False;
Doc.Active := True;
Node := Doc.AddChild('p');
Node.Attributes['class'] := 'leaders';
Child := Doc.CreateNode('Todd', ntText);
Node.ChildNodes.Add(Child);
Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';
Child := Doc.CreateNode('R', ntText);
Node.ChildNodes.Add(Child);
Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';
Child := Doc.CreateNode('Colas', ntText);
Node.ChildNodes.Add(Child);
Doc.SaveTo...(...); // generate the XML to your preferred output
If you want whitespace/linebreaks to appear in the XML output, simply include those characters in the content of the #text
nodes. When parsing XML into TXMLDocument
, unnecessary whitespace is stripped off by default. If you want to preserve it, enable the poPreserveWhiteSpace
flag in the ParseOptions
property before loading the XML.
` so it's not valid XHTML.
– David Heffernan Aug 01 '12 at 15:34