4

Is this XML "valid"?

<?xml version="1.0"?>
<p class="leaders">
    Todd
    <span class="leader-type">.</span>
    R
    <span class="leader-type">.</span>
    Colas
</p>

I've never seen an XML doc with multiple "values" for a node like this does for the <p> node.

How do I parse out the three values for <p> with TXMLDocument? And how to traverse to the <span> nodes?

Finally...how do I create an XML document like this with TXMLDocument????

Help!!!!

user1498879
  • 59
  • 1
  • 4

2 Answers2

6

When you say, is it valid, I think you mean: is it well-formed? (We can't tell whether it is valid without a DTD or schema).

Yes, it is well-formed. It is a perfecly normal example of a document containing mixed content, which is what XML is designed for.

I can't answer your questions about TXMLDocument because I've never heard of it: presumably it's part of a delphi XML library.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • +1 for explaining the difference between well-formed and valid. – Jeroen Wiert Pluimers Aug 01 '12 at 19:56
  • +1 for "well-formed". `TXMLDocument` is a wrapper for the `XML DOM`. By default on the Windows OS, it uses `MS XML` by default; on other OSs (or optionally, if that's your preference), it uses Open Source XML (Xerces or OpenXML depending on the Delphi version, IIRC, but I don't use them so I'm not sure). – Ken White Aug 01 '12 at 22:51
6

Yes, it is valid XML. To parse it, you have to understand that XML is represented as a tree of nodes. That XML would parse into the following tree structure.

p
|_ attributes
| |_ "class"="leaders"
|
|_ children
  |_ #text "Todd"
  |
  |_ span
  | |_ attributes
  | | |_ "class"="leader-type"
  | |
  | |_ children
  |   |_ #text "."
  |
  |_ #text "R"
  |
  |_ span
  | |_ attributes
  | | |_ "class"="leader-type"
  | |
  | |_ children
  |   |_ #text "."
  |
  |_ #text "Colas"

Each attribute and child node is represents as a separate IXMLNode interface in the TXMLDocument. As you can see, the plain text portions are separated into their own #text nodes.

Once you have loaded the XML into TXMLDocument, the TXMLDocument.DocumentElement property represents the <p> node. That node's AttributeNodes property contains a "class" node, and its ChildNodes property contains the first level of #text and <span> nodes. The <span> nodes have their own AttributeNodes and ChildNodes lists, and so on. So to parse this, you would iterate through the tree looking for the #text nodes, using the <span> nodes to manipulate the text as needed.

To create such a document, you simply create the individual nodes as needed, eg:

Doc.Active := False;
Doc.Active := True;

Node := Doc.AddChild('p');
Node.Attributes['class'] := 'leaders';

Child := Doc.CreateNode('Todd', ntText);
Node.ChildNodes.Add(Child);

Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';

Child := Doc.CreateNode('R', ntText);
Node.ChildNodes.Add(Child);

Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';

Child := Doc.CreateNode('Colas', ntText);
Node.ChildNodes.Add(Child);

Doc.SaveTo...(...); // generate the XML to your preferred output

If you want whitespace/linebreaks to appear in the XML output, simply include those characters in the content of the #text nodes. When parsing XML into TXMLDocument, unnecessary whitespace is stripped off by default. If you want to preserve it, enable the poPreserveWhiteSpace flag in the ParseOptions property before loading the XML.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • and to view the tree it would be probably most simple to open such XML in web browser. Then use built-in DOM Inspection tools. In MSIE9 that is F12 key, in Chrome r-click and "Inspect" menu, in Opera both ways work. – Arioch 'The Aug 02 '12 at 15:48