8

I'm writing out XML files using the MSXML parser, with a wrapper I downloaded from here: http://www.codeproject.com/KB/XML/JW_CXml.aspx. Works great except that when I create a new document from code (so not load from file and modify), the result is all in one big line. I'd like elements to be indented nicely so that I can read it easily in a text editor.

Googling shows many people with the same question - asked around 2001 or so. Replies usually say 'apply an XSL transformation' or 'add your own whitespace nodes'. Especially the last one makes me go %( so I'm hoping that in 2008 there's an easier way to pretty MSXML output. So my question; is there, and how do I use it?

Adam Bellaire
  • 108,003
  • 19
  • 148
  • 163
Roel
  • 19,338
  • 6
  • 61
  • 90
  • Welcome to microsoft hell. A maze of attrocious documentation across 6 versions and multiple languages, most of which are for a different version you're working with. Microsoft usage examples involving macros, gotos, and all manner of crimes against humanity. I am going down the route of creating a stylesheet and apply a transformNodeToObject function using the stylesheet, unfortunately an undocumented exception is being thrown.... – Owl May 30 '17 at 12:54

5 Answers5

4

Here's a modified version of the accepted answer that will transform in-memory (changes only in the last few lines but I'm posting the whole block for the convenience of future readers):

bool CXml::FormatDOMDocument(IXMLDOMDocument *pDoc)
{
    // Create the writer
    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL))) {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler))) {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler))) {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler))) {
        return false;
    }

    if (FAILED (pMXWriter->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader
    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED(pSAXReader.CoCreateInstance(__uuidof (SAXXMLReader), NULL, CLSCTX_ALL))) {
        return false;
    }

    if (FAILED(pSAXReader->putContentHandler (pISAXContentHandler)) ||
        FAILED(pSAXReader->putDTDHandler (pISAXDTDHandler)) ||
        FAILED(pSAXReader->putErrorHandler (pISAXErrorHandler)) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write
    bool success1 = SUCCEEDED(pMXWriter->put_output(CComVariant(pDoc.GetInterfacePtr())));
    bool success2 = SUCCEEDED(pSAXReader->parse(CComVariant(pDoc.GetInterfacePtr())));

    return success1 && success2;
}
Roel
  • 19,338
  • 6
  • 61
  • 90
  • I had some trouble getting this to work at first, until I noticed that I was using __uuidof(MXXMLWriter40) instead of __uuidof(MXXMLWriter). (The code was "successful" but didn't result in indented XML.) I'm a bit surprised that it made that much of a difference. – Miral Mar 30 '11 at 09:10
  • Versioning of MSXML always trips me up too, I have no idea how it's supposed to work. – Roel Mar 30 '11 at 13:59
  • 1
    For me this didn't work with the ".GetInterfacePtr()" bit. I found a VERY similar code snippet without that here: http://social.msdn.microsoft.com/forums/vstudio/en-US/e2e1ec90-bc92-4f70-96ec-94499e862402/add-indent-spaces-to-xml-nodes-with-ms-ixmldom – Chris Dolan May 16 '14 at 19:35
  • This doesn't seem to work for 60 (MXXMLWriter60, SAXXMLReader60, DOMDocument60). Any ideas why? It just creates a blank document when I save it. – Adrian Feb 27 '15 at 15:30
  • This solution just output a blank file. I found a solution [here](http://stackoverflow.com/questions/29036336/is-there-a-way-to-modify-the-style-sheet-so-that-it-transforms-an-xml-document-w). – Adrian Mar 18 '15 at 19:39
  • When I use the same `IXMLDOMDocument` object for both the input and output, I get a blank `IXMLDOMDocument`. I found it necessary to use an intermediate object. – gigaplex Oct 26 '15 at 07:49
3

Even my 2 cents arrive 7 years later I think the question still deserves a simple answer wrapped in just a few lines of code, which is possible by using Visual C++'s #import directive and the native C++ COM support library (offering smart pointers and encapsulating error handling).

Note that like the accepted answer it doesn't try to fit into the CXml class the OP is using but rather shows the core idea. Also I assume msxml6.

Pretty-printing to any stream

void PrettyWriteXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, IStream* stream)
{
    MSXML2::IMXWriterPtr writer(__uuidof(MSXML2::MXXMLWriter60));
    writer->encoding = L"utf-8";
    writer->indent = _variant_t(true);
    writer->standalone = _variant_t(true);
    writer->output = stream;

    MSXML2::ISAXXMLReaderPtr saxReader(__uuidof(MSXML2::SAXXMLReader60));
    saxReader->putContentHandler(MSXML2::ISAXContentHandlerPtr(writer));
    saxReader->putProperty(PUSHORT(L"http://xml.org/sax/properties/lexical-handler"), writer.GetInterfacePtr());
    saxReader->parse(xmlDoc);
}

File stream

If you need a stream writing to a file you need an implementation of the IStream interface.
wtlext has got a class, which you can use or from which you can deduce how you can write your own.

Another simple solution that has worked well for me is utilising the Ado Stream class:

void PrettySaveXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, const wchar_t* filePath)
{
    ADODB::_StreamPtr stream(__uuidof(ADODB::Stream));
    stream->Type = ADODB::adTypeBinary;
    stream->Open(vtMissing, ADODB::adModeUnknown, ADODB::adOpenStreamUnspecified, _bstr_t(), _bstr_t());
    PrettyWriteXmlDocument(xmlDoc, IStreamPtr(stream));
    stream->SaveToFile(filePath, ADODB::adSaveCreateOverWrite);
}

Glueing it together

A simplistic main function shows this in action:

#include <stdlib.h>
#include <objbase.h>
#include <comutil.h>
#include <comdef.h>
#include <comdefsp.h>
#import <msxml6.dll>
#import <msado60.tlb> rename("EOF", "EndOfFile")  // requires: /I $(CommonProgramFiles)\System\ado


void PrettyWriteXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, IStream* stream);
void PrettySaveXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, const wchar_t* filePath);


int wmain()
{
    CoInitializeEx(nullptr, COINIT_MULTITHREADED);

    try
    {
        MSXML2::IXMLDOMDocumentPtr xmlDoc(__uuidof(MSXML2::DOMDocument60));
        xmlDoc->appendChild(xmlDoc->createElement(L"root"));

        PrettySaveXmlDocument(xmlDoc, L"xmldoc.xml");
    }
    catch (const _com_error&)
    {
    }

    CoUninitialize();

    return EXIT_SUCCESS;
}


// assume definitions of PrettyWriteXmlDocument and PrettySaveXmlDocument go here
klaus triendl
  • 1,237
  • 14
  • 25
2

Try this, I found this years ago on the web.

#include <msxml2.h>

bool FormatDOMDocument (IXMLDOMDocument *pDoc, IStream *pStream)
{

    // Create the writer

    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL)))
    {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler)))
    {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler)))
    {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler)))
    {
        return false;
    }

    if (FAILED (pMXWriter ->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter ->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader

    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED (pSAXReader.CoCreateInstance (__uuidof (SAXXMLReader), NULL, CLSCTX_ALL)))
    {
        return false;
    }

    if (FAILED (pSAXReader ->putContentHandler (pISAXContentHandler)) ||
        FAILED (pSAXReader ->putDTDHandler (pISAXDTDHandler)) ||
        FAILED (pSAXReader ->putErrorHandler (pISAXErrorHandler)) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write

    return 
       SUCCEEDED (pMXWriter ->put_output (CComVariant (pStream))) &&
       SUCCEEDED (pSAXReader ->parse (CComVariant (pDoc)));
}
Torlack
  • 4,395
  • 1
  • 23
  • 24
  • Awesome, worked great. I made some modifications to do this in-memory, I'll post those in a separate post because it won't fit in a comment. – Roel Oct 02 '08 at 21:43
  • I really wish I could find the original source myself. This saved my butt :) – Torlack Oct 02 '08 at 21:44
  • I did find a similar one on http://discuss.com.cn/xml/86274.html, that's where I got the '.GetInterfacePtr()' part from. I don't speak Chinese though so I don't understand most of the page :) – Roel Oct 02 '08 at 21:46
  • What is the stream? I hope it's not a stream of XML, because that would be hideous. – Owl May 30 '17 at 12:56
0

Unless the library has a format option then the only other way is to use XSLT, or an external pretty printer ( I think htmltidy can also do xml) There doen't seem to be an option in the codeproject lib but you can specify an XSLT stylesheet to MSXML.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • 1
    What a bummer. There is indeed no option in the codeproject lib, it's just a barebones wrapper around MSXML. Is there an easy way to apply the stylesheet (without writing the xml out to disk and formatting it there, then reading it back in)? What would the stylesheet look like? – Roel Oct 02 '08 at 21:11
0

I've written a sed script a while back for basic xml indenting. You can use it as an external indenter if all else fails (save this to xmlindent.sed, and process your xml with sed -f xmlindent.sed <filename>). You might need cygwin or some other posix environment to use it though.

Here's the source:

:a
/>/!N;s/\n/ /;ta
s/  / /g;s/^ *//;s/  */ /g
/^<!--/{
:e
/-->/!N;s/\n//;te
s/-->/\n/;D;
}
/^<[?!][^>]*>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<\/[^>]*>/{
H;x;s/\n//;s/>.*$/>/;s/^    //;p;bb
}
/^<[^>]*\/>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<[^>]*[^\/]>/{
H;x;s/\n//;s/>.*$/>/;p;s/^/ /;bb
}
/</!ba
{
H;x;s/\n//;s/ *<.*$//;p;s/[^    ].*$//;x;s/^[^<]*//;ba
}
:b
{
s/[^    ].*$//;x;s/^<[^>]*>//;ba
}

Hrmp, tabs seem to be garbled... You can copy-waste from here instead: XML indenting with sed(1)

mitchnull
  • 6,161
  • 2
  • 31
  • 23