2

I'm working on a problem where XML exported from our program doesn't escape quotes, (turning " into ",) leading to problems on the receiving end. It escapes &s and angle brackets just fine, but not quotes.

When I dug around in the XML export code, I found that it was a pretty straightforward IXmlDomDocument2 DOM interface. But when I got to the step where it produces the XML string output by calling the .XML method, I ran smack into a wall of proprietariness that I can't trace into, since all the work is taking place inside of C:\Windows\System32\msxml3.dll.

So apparently Microsoft's IXmlDomDocument2 implementation knows how to escape some symbols but not others. And just to make it worse, the obvious but ugly solution, (running a preprocessing step by recursively traversing the entire document and replacing all quotes in values with '"' before I call .XML,) won't work because the .XML method will see those &s in there and escape them! Is there any way to fix this?

Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477
  • Very bad but you can add a post-processing step replacing again those " with & again. – jachguate Feb 03 '11 at 23:28
  • @jachguate: Yeah, I thought of that, but to do that I'd need some sort of parser to distinguish between attributes with quotes and normal data. – Mason Wheeler Feb 03 '11 at 23:41
  • should define entities in DTD before – Free Consulting Feb 03 '11 at 23:44
  • @Worm Regards: Umm... sorry, what does that mean? XML is not my strongest suit. Could you elaborate, maybe in an answer? – Mason Wheeler Feb 03 '11 at 23:48
  • I misunderstood your question. Thought you had a problem with `"` in attributes because that is an illegal character in an attribute. If I now understand correctly you have a problem with `"` in node values (where it is allowed). I deleted my answer and I think it is your "receiving end" that actually has problems and needs fixing. – Mikael Eriksson Feb 08 '11 at 07:03

1 Answers1

3

This could be considered a bug in the XML Parser used on the other end. The XML Specification details the entities that can be escaped. But they only need to be escaped inside the attributes, which works as shown here:

program Project2;

{$APPTYPE CONSOLE}

uses
  ActiveX,
  MSXML2_TLB,
  SysUtils;
var
  Dom : IXMLDOMDocument2;
  Root :  IXMLDOMNode;
  Attr : IXMLDOMNode;
begin
  CoInitialize(nil);
  try
    DOM := CoDOMDocument40.Create;
    Root := Dom.createElement('root');
    Attr := Dom.createAttribute('attr');
    Attr.text := '"';
    root.attributes.setNamedItem(Attr);
    root.text := '"Hello World"';
    DOM.appendChild(Root);
    writeln(Root.xml);
    readln;
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end.

But the reality is that you may not have control over the other side of the equation. So you can get the desired behavior doing the following:

program Project2;

{$APPTYPE CONSOLE}

uses
  ActiveX,
  MSXML2_TLB,
  SysUtils;
function QuoteEscape(const v : String) : String;
begin
  result := StringReplace(V,'"','"',[rfReplaceAll]);
end;


var
  Dom : IXMLDOMDocument2;
  Root :  IXMLDOMNode;
  Attr : IXMLDOMNode;
begin
  CoInitialize(nil);
  try
    DOM := CoDOMDocument40.Create;
    Root := Dom.createElement('root');
    Attr := Dom.createAttribute('attr');
    Attr.text := '"';
    root.attributes.setNamedItem(Attr);
    root.text :=  QuoteEscape('"Hello World"');
    DOM.appendChild(Root);
    writeln(StringReplace(Root.xml,'"','"',[rfReplaceAll]));
    readln;
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end.
Robert Love
  • 12,447
  • 2
  • 48
  • 80
  • I do not have MSXML 4 installed so I can't test your code. But if I change to `CoDOMDocument60.Create;` it works just fine and the quote in attribute is properly escaped. – Mikael Eriksson Feb 04 '11 at 07:07
  • Yes it works regardless of the version. Even using msXML.pas it works for the attributes. – Robert Love Feb 07 '11 at 18:16
  • Ok. I clearly misunderstood the question. I thought that OP had a problem with " in attributes where they are not allowed but he has a problem with " in values between tags? That should not be a problem because that is allowed. Anyway, he said it was a problem so your fix is just fine. – Mikael Eriksson Feb 08 '11 at 06:38