2

I am trying to "pretty" an XML file. As suggested in some other SO questions, I am using the following stylesheet to transform:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml" encoding="UTF-16" />
<xsl:strip-space elements="*"/>
<xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

However this is not producing the desired results. For an input file of:

 <A><B><C /></B></A>

the generated output is:

<?xml version="1.0" encoding="UTF-16"?>
<A>
<B>
<C>
</C>
</B>
</A>

But the output I am expecting is (header line doesn't matter):

<A>
    <B>
        <C />
    </B>
</A>

So there are two problems:

  • There is no indentation in the output
  • The <C /> tag has been "unpacked", which I don't want.

I have tried with MSXSL.exe , and by using (via C++) IXMLDOMDocument2::transformNode outputting to a BSTR, both methods produce identical output.

What's going wrong here?

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    The MSXSL.exe utility is 11 years old. `IXMLDOMDocument2` is also tremendously old. I would suggest making your own simple utility using [`XmlCompiledTransform`](https://msdn.microsoft.com/en-us/library/system.xml.xsl.xslcompiledtransform(v=vs.110).aspx). That is the recommended .NET API for executing XSLTs and will surely handle indentation correctly. You could also use something like Saxon, but the current version for XSLT 2.0 and you may encounter some compatibility issues if you are writing XSLT 1.0. – JLRishe Jan 25 '15 at 05:41
  • @JLRishe My goal is to do this programmatically from C++ (not .NET) , is there a COM version of that (or otherwise)? MSXSL is only 25KB so presumably it is just offloading to some other Windows facility which ought to be up to date. – M.M Jan 25 '15 at 05:44
  • 1
    Yes, MSXSL.exe is just a wrapper for the few different COM interfaces for XSLT (you can obtain the utility's source code from that link you provided and see the msxmlinf.cxx file), which most likely haven't been updated in 11 years. I'm afraid I don't know of a better option for COM. I don't usually work with COM or native code. – JLRishe Jan 25 '15 at 05:56
  • @JLRishe OK, thanks. I have a backup option in that using SAXXMLReader with MXXMLWriter works. The XSLT option is much less ugly though, it would have been nice to get that working. – M.M Jan 25 '15 at 05:58
  • 1
    [This SO answer](http://stackoverflow.com/a/11266249/1945651) provides an XSLT that manually adds indentation to an XML document. Perhaps you could use that as a post-processing step? – JLRishe Jan 25 '15 at 06:08

1 Answers1

1

The following WSH (Windows Scripting Host) JScript program using MSXML 6.0 (which is available on all supported Microsoft OS by default, without any installation) outputs

<?xml version="1.0" encoding="UTF-16"?>
<A>
        <B>
                <C></C>
        </B>
</A>

Program is

var msxmlVersion = '6.0';
var xml = new ActiveXObject('Msxml2.DOMDocument.' + msxmlVersion);
xml.async = false;
xml.load('test2015012501.xml');

var xsl = new ActiveXObject('Msxml2.DOMDocument.' + msxmlVersion);
xsl.async = false;
xsl.load('test2015012501.xsl');

var resultDoc = new ActiveXObject('Msxml2.DOMDocument.' + msxmlVersion);

xml.transformNodeToObject(xsl, resultDoc);

WScript.Echo(resultDoc.xml);

the input and XSLT are your samples. So using MSXML 6.0 and transformNodeToObject you get better indentation results, although for my needs the indentation is using too many indent characters.

Of course instead of using JScript you should be able to use MSXML 6 with C++ and get the same results.

And if you want a file instead of a string you can of course use resultDoc.save('file.xml').

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • 1
    Instead of `WScript.Echo(resultDoc.xml)` I think explicitly recommending `resultDoc.save('outFile.xml')` is better, because it reduces the danger of people using FileSystemObject/TextStream to save XML and potentially running into file encoding issues. – Tomalak Jan 25 '15 at 10:38
  • how about the issue of `` being changed to `` ? – M.M Jan 25 '15 at 12:51
  • In XML `` and `` as well as `` are all markup with the same semantics, namely a `C` element with no child nodes, so that change is allowed. The result you got with spaces or line breaks inserted in a previously empty element is a bug in my view. – Martin Honnen Jan 25 '15 at 13:02
  • I'm aware that `` and `` have the same semantics, however I'd prefer the former version for various reasons – M.M Jan 25 '15 at 13:11
  • OK, using `transformNodeToObject` outputting to another DOM document, gave the correct indentation; however using `transformNode` outputting to `BSTR` gave the faulty indentation. Weird... – M.M Jan 25 '15 at 13:18
  • I am not aware of any setting in the MSXML world to ensure the serialization of an empty element as ``, sorry. The .NET DOM has https://msdn.microsoft.com/en-us/library/system.xml.xmlelement.isempty%28v=vs.110%29.aspx but I don't think there is anything similar in the MSXML DOM nor am I aware of any serialiation settings. – Martin Honnen Jan 25 '15 at 13:26
  • @MartinHonnen OK, thanks for the progress anyway. The MXWriter maintains whatever form the input was in, so I guess I will go with that solution instead of the XSLT one. – M.M Jan 25 '15 at 13:31