0

This is a follow up to my earlier post: String to XmlNode Delphi (or how to add an xml fragment to TXMLDocument) It seemed appropriate to start a new question...

I am essentially adding well formed xml snippets to an existing xmldocument. The code suggested in the previous solution had been working great - until - I added [poPreserveWhiteSpace] to TXMLDocument.ParseOptions.

When I remove [poPreserveWhiteSpace] everything works fine, but whitespace is not preserved. It actually puts the closing tag on a new line.

Here is a code snippet ot the Target TXMLDocument.

  StoredXMLObj := TXMLDocument.Create(self);
  StoredXMLObj.Options := [doNodeAutoCreate, doNodeAutoIndent];
  StoredXMLObj.ParseOptions := StoredXMLObj.ParseOptions + [poPreserveWhiteSpace];
  StoredXMLObj.XML.Assign(StoredXML);  //StoredXML is a TStringList with a complete XML Document
  StoredXMLObj.Active := TRUE;

I have tried different combinations of the Options and ParseOptions above, but I can only get the code to work by removing [poPreserveWhiteSpace].

The code that triggers the exception is the second line of:

tmpNode := storedXMLObj.DocumentElement.ChildNodes[i]; // <Class> node
tmpNode.ChildNodes.Nodes[1].ChildNodes.Nodes[0].ChildNodes.Add(LoadXMLData(MissingElements[j]).DocumentElement); //TMPNode is an IXMLNode and MissingElements is a TStringList

I tried creating a reference to the return value of LoadXMLData(..), and setting those ParseOptions to match, before adding the xml snippet, but no luck there either.

Any thoughts?

Edit: Adding self contained sample code to demonstrate problem. Clarified Title. Here is some simplified code. Note that there will be an exception unless you comment out the line containing [poPreserveWhitespace]. **Edit2: Tweaking code to preserve whitespace as per Remy's suggestion. Still has problem when calling FormatXMLData.

procedure TForm2.BitBtn2Click(Sender: TObject);
var
  FragmentXMLObj : TXMLDocument;
  StoredXMLObj : TXMLDocument;
  FragNode : IXMLNode;  //THIS SHOULD BE IXMLNODE, RIGHT?
  XMLStarting, XMLFragment, XMLMerged : TStringList;
  i : integer;
begin
//StringLists to hold xml data
  XMLStarting := TStringList.Create;  //COMPLETE XML
  XMLFragment := TStringList.Create;  //XML FRAGMENT TO INSERT INTO COMPLETE XML
  XMLMerged := TStringList.Create;    //MERGE OF THE ABOVE TWO.

//STARTING XML
  XMLStarting.Add('<?xml version="1.0" encoding="UTF-16" standalone="no"?>');
  XMLStarting.Add('<Programs>');
  XMLStarting.Add(' <Program_Group Batch_No="{12345678-1234-1234-1234-123456789ABC}" Description="FOO_824_1">');
  XMLStarting.Add('     <Program Name="PROG_1">');
  XMLStarting.Add('         <Class Name="CLASS_1">');
  XMLStarting.Add('             <Property Name="DB" RttiType="tkString">      </Property>');
  XMLStarting.Add('             <Property Name="SystemDate" RttiType="tkClass" ClassType="TXSDATE">12/30/1899</Property>');
  XMLStarting.Add('         </Class>');
  XMLStarting.Add('     </Program>');
  XMLStarting.Add(' </Program_Group>');
  XMLStarting.Add('</Programs>');

//XML DOCUMENT OBJECT
  StoredXMLObj := TXMLDocument.create(self);
  //PROBLEM LINE START
  StoredXMLObj.ParseOptions := StoredXMLObj.ParseOptions + [poPreserveWhiteSpace];
  //PROBLEM LINE END
  StoredXMLObj.Options := [doNodeAutoCreate, doNodeAutoIndent];
  StoredXMLObj.XML.Text := XMLStarting.Text;
  StoredXMLObj.Active := TRUE;

//XML FRAGMENT WITH SPACES
  XMLFragment.Add('<ParentNode>');
  XMLFragment.Add('<Property Name="VRSN" RttiType="tkString">    </Property>');
  XMLFragment.Add('<Property Name="ShowMetaData" RttiType="tkBoolean">     </Property>');
  XMLFragment.Add('</ParentNode>');

//--OLD CODE THAT RAISES EXCEPTION--
//INSERTING XML FRAGMENT INTO STARTING XML
//  FragNode := storedXMLObj.DocumentElement.ChildNodes[0];
//  FragNode.ChildNodes.Nodes[0].ChildNodes.Nodes[0].ChildNodes.Add(LoadXMLData(XMLFragment.Text).DocumentElement.ChildNodes.Nodes[0]);
//  FragNode.ChildNodes.Nodes[0].ChildNodes.Nodes[0].ChildNodes.Add(LoadXMLData(XMLFragment.Text).DocumentElement.ChildNodes.Nodes[1]);
//--OLD CODE THAT RAISES EXCEPTION--

  FragNode := storedXMLObj.DocumentElement.ChildNodes[1];
  FragmentXMLObj := TXMLDocument.Create(self);
  FragmentXMLObj.ParseOptions := FragmentXMLObj.ParseOptions + [poPreserveWhiteSpace];
  FragmentXMLObj.Options := [doNodeAutoCreate, doNodeAutoIndent];
  FragmentXMLObj.LoadFromXML(XMLFragment.Text);

  //FragNode.ChildNodes.Nodes[1].ChildNodes.Nodes[1].ChildNodes.Add(FragmentXMLObj.DocumentElement);  //this also pulls in the parent tags, which I don't want.
  for i := 0 to FragmentXMLObj.DocumentElement.ChildNodes.Count-1 do  //easier to just pull in all the nodes (including whitespace, then formatxml to cleanup).
    FragNode.ChildNodes.Nodes[1].ChildNodes.Nodes[1].ChildNodes.Add(FragmentXMLObj.DocumentElement.ChildNodes.Nodes[i]);
  FragmentXMLObj.Free;

  XMLMerged.Text := StoredXMLObj.XML.Text;
  XMLMerged.Text := FormatXMLData(XMLMerged.Text);  //UGH... FormatXMLData WIPES OUT WHITESPACE PROPERTY VALUES!!  Doesn't seem to have any settings either...
  XMLMerged.SaveToFile('c:\merged.xml');

  XMLStarting.Free;
  XMLFragment.Free;
  XMLMerged.Free;
  StoredXMLObj.Free;
end;

The Resulting Merged XML File... Whitespace property values got wiped out during formatting (and I do need to format the data, otw it is REALLY ugly).

<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<Programs>
  <Program_Group Batch_No="{12345678-1234-1234-1234-123456789ABC}" Description="FOO_824_1">
    <Program Name="PROG_1">
      <Class Name="CLASS_1">
        <Property Name="DB" RttiType="tkString"/>
        <Property Name="SystemDate" RttiType="tkClass" ClassType="TXSDATE">12/30/1899</Property>
        <Property Name="VRSN" RttiType="tkString"/>
        <Property Name="ShowMetaData" RttiType="tkBoolean"/>
      </Class>
    </Program>
  </Program_Group>
</Programs>
Community
  • 1
  • 1
sse
  • 987
  • 1
  • 11
  • 30

1 Answers1

2

LoadXMLData() expects the input string to be a well-formed XML document. The solution I gave you for your previous question worked because you were specifying individual XML elements which by themselves can act be standalone documents. But a PCDATA element by itself is not a well-formed XML document. Try wrapping it in a fake element, eg:

tmpDoc := LoadXMLData('<Doc>' + MissingElements[j] + '</Doc>').DocumentElement;
for I := 0 to tmpDoc.ChildNodes.Count-1 do
  tmpNode.ChildNodes[1].ChildNodes[0].ChildNodes.Add(tmpDoc.ChildNodes[I]);

Update: You are getting an "index out of bounds" error because you are not taking the whitespace DOM nodes into account when accessing the ChildNodes.

Given the XML you have shown:

XMLStarting.Add('<?xml version="1.0" encoding="UTF-16" standalone="no"?>');
XMLStarting.Add('<Programs>');
XMLStarting.Add(' <Program_Group Batch_No="{12345678-1234-1234-1234-123456789ABC}" Description="FOO_824_1">');
XMLStarting.Add('     <Program Name="PROG_1">');
XMLStarting.Add('         <Class Name="CLASS_1">');
XMLStarting.Add('             <Property Name="DB" RttiType="tkString">      </Property>');
XMLStarting.Add('             <Property Name="SystemDate" RttiType="tkClass" ClassType="TXSDATE">12/30/1899</Property>');
XMLStarting.Add('         </Class>');
XMLStarting.Add('     </Program>');
XMLStarting.Add(' </Program_Group>');
XMLStarting.Add('</Programs>');

And given the code you have shown which is failing:

FragNode := storedXMLObj.DocumentElement.ChildNodes[0];
FragNode.ChildNodes.Nodes[0].ChildNodes.Nodes[0].ChildNodes.Add(LoadXMLData(XMLFragment.Text).DocumentElement.ChildNodes.Nodes[0]);

The following is true:

  1. storedXMLObj.DocumentElement refers to the <Programs> node.
  2. its ChildNodes[0] node refers to the whitespace between the <Programs> and <Program_Group> nodes, but you are expecting it to refer to the <Program_Group> node instead.
  3. thus, FragNode.ChildNodes.Nodes[0] fails because FragNode is a text-only node that has no children!

You can confirm that for yourself. FragNode.NodeName is '#text', FragNode.NodeType is ntText, FragNode.NodeValue is #$A' ', FragNode.HasChildNodes is False, and FragNode.IsTextElement is True.

In other words, the above XML has the following structure to it:

ntElement 'Programs'
|
|_ ntText #$A' '
|
|_ ntElement 'Program_Group'
   |
   |_ ntText #$A'     '
   |
   |_ ntElement 'Program'
   |  |
   |  |_ ntText #$A'         '
   |  |
   |  |_ ntElement 'Class'
   |  |  |
   |  |  |_ ntText #$A'             '
   |  |  |
   |  |  |_ nElement 'Property'
   |  |  |  |
   |  |  |  |_ ntText '      '
   |  |  |
   |  |  |_ ntText #$A'             '
   |  |  |
   |  |  |_ ntElement 'Property'
   |  |  |  |
   |  |  |  |_ ntText '12/30/1899'
   |  |  |
   |  |  |_ ntText #$A'         '
   |  |
   |  |_ ntText #$A'     '
   |
   |_ ntText #$A' '

Hopefully that makes it a bit clearer.

So, to accomplish what you are attempting to do, you would need something more like this:

FragNode := storedXMLObj.DocumentElement.ChildNodes[1];
FragNode.ChildNodes.Nodes[1].ChildNodes.Nodes[1].ChildNodes.Add(LoadXMLData(XMLFragment.Text).DocumentElement);
FragNode.ChildNodes.Nodes[1].ChildNodes.Nodes[1].ChildNodes.Add(LoadXMLData(XMLFragment.Text).DocumentElement);

If you want to preserve whitespace in the LoadXMLData() fragments, you will have to use TXMLDocument directly instead since LoadXMLData() does not let you set the poPreserveWhiteSpace flag:

FragmentXMLObj := TXMLDocument.Create(self);
FragmentXMLObj.ParseOptions := FragmentXMLObj.ParseOptions + [poPreserveWhiteSpace];
FragmentXMLObj.Options := [doNodeAutoCreate, doNodeAutoIndent];
FragmentXMLObj.LoadFromXML(XMLFragment.Text);
FragNode.ChildNodes.Nodes[1].ChildNodes.Nodes[1].ChildNodes.Add(FragmentXMLObj.DocumentElement);
FragmentXMLObj.Free;

To avoid any problem with ChildNodes indexes, you are better off using an XPath query instead, so you can let the DOM search for the <Class> node that you want to insert fragments into.

Either way, you will soon discover that this does not produce very nice looking XML. If you just want there to be whitespace present, but you don't actually need to preserve the original whitespace as-is, then you are better off disabling the poPreserveWhiteSpace flag, and then use FormatXMLData() when you are saving the final document:

XMLMerged.Text := FormatXMLData(StoredXMLObj.XML.Text);
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • The elements I am adding are exactly the same well formed elements as before. It's unclear to me why adding white space preservation would change TXMLDocument's behavior. – sse Jun 04 '13 at 13:18
  • I tested the code using the same elements as before, and they were added fine with and without whitespace preservation. – Remy Lebeau Jun 04 '13 at 14:34
  • Like I said, when I tried adding new nodes with whitespace preservation enabled in the destination, it worked fine for me. So something else is going on. – Remy Lebeau Jun 04 '13 at 23:42
  • I have no doubt this is true. I will continue digging and share what I find. – sse Jun 05 '13 at 14:04
  • Well, my digging has produced little. I updated my post to include some self contained sample code. Still get a runtime exception when turning on [poPreserveWhitespace]. Does anyone else get the same result? – sse Jun 14 '13 at 18:05
  • You are not taking into account that preserved whitespace is represented as additional IXMLNode objects within ChildNodes lists. You need to ignore those extra nodes when drilling down the DOM tree. Your crash is because you are not accessing the element nodes you think you are accessing, so you end up accessing a nil pointer while you access the wrong node and it has no further children to access. – Remy Lebeau Jun 15 '13 at 03:54
  • When you say preserved whitespace are you referring to the xml element value - or the surrounding whitespace in between elements? I wonder if you could update your code snippet to reflect what you are suggesting? – sse Jun 17 '13 at 00:58
  • I'm referring to the surrounding whitespace. Preserved whitespace has its own nodes in the DOM tree after parsing. I will update the code snippet tomorrow. – Remy Lebeau Jun 17 '13 at 06:27
  • Interesting, just an fyi I trimmed all of the xml in the example above and still got an error. I guess it's considering CRLF's as whitespace too. I don't want to lose any existing whitespace values in the original document too. So the document must be initialized with WhiteSpace on? I mention this because if I initialize with whitespace off, I get a list index out of bounds, but if I initialize with whitespace on (then turn it off before adding elements) I get a PCDATA error. – sse Jun 17 '13 at 13:52
  • thank you for the update and the code sample. I eliminated the part where you added the entire FragmentXMLObj.DocumentElement(it was pulling in the parent node too). Instead I pulled in every child node with a loop over childcount (including the blanks which is fine, since they aren't property values). Unfortunately formatting is terrible, as predicted, so I used the formatXMLData as suggested. After which the whitespace property values were lost. I updated my post with the changes I made and the resulting xml. The key whitespace I need preserved are the values. Any other thoughts? – sse Jun 27 '13 at 21:17
  • @sse: Why do the property values need to be whitespace to begin with? That is an unusual requirement. If you need to preserve whitespace, you have to preserve ALL whitespace, which then gets into formatting issues. When does whitespace represent an actual node value, and when does it represent actual whitespace between nodes? There is no way to know. If a node value should be blank, let it be a 0-length value, not whitespace. – Remy Lebeau Jun 27 '13 at 21:55
  • The xml data feeds into two systems, one new and one legacy. The legacy system requires memory buffers with precise property values including spaces, hence the need to preserve whitespaces. If it were in my purview to change the requirements I would. It's beginning to sound as if it's impossible to preservewhitespace and insert xml fragments, at least with txmldocument/msxml. What do you think? – sse Jun 28 '13 at 16:25
  • It is not impossible, just difficult, because you have to differentiate between types of whitespace. Whitespace between nodes is stored as siblings between the nodes, whereas whitespace as node values is stored as children of the nodes instead. – Remy Lebeau Jun 28 '13 at 20:16
  • Thank you. I am going to accept your answer, as it does preserve whitespace property values. It just doesn't "beautify". I will post a separate question regarding formatting xml (without using formatxmldata). Btw, I do believe the SAXXMLReader might work for this beautification step. – sse Jul 02 '13 at 19:23