33

I'm using boost::property_tree to read and write XML configuration files in my application. But when I write the file the output looks kind of ugly with lots of empty lines in the file. The problem is that it's supposed to be edited by humans too so I'd like to get a better output.

As an example I wrote a small test program :

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>

int main( void )
{
    using boost::property_tree::ptree;
    ptree pt;

    // reading file.xml
    read_xml("file.xml", pt);

    // writing the unchanged ptree in file2.xml
    boost::property_tree::xml_writer_settings<char> settings('\t', 1);
    write_xml("file2.xml", pt, std::locale(), settings);

    return 0;
}

file.xml contains:

<?xml version="1.0" ?>
<config>
    <net>
        <listenPort>10420</listenPort>
    </net>
</config>

after running the program file2.xml contains:

<?xml version="1.0" encoding="utf-8"?>
<config>



    <net>



        <listenPort>10420</listenPort>
    </net>
</config>

Is there a way to have a better output, other than going manually through the output and deleting empty lines?

Null
  • 1,950
  • 9
  • 30
  • 33
foke
  • 1,339
  • 2
  • 12
  • 20
  • 2
    boost::property_tree uses an XML parser called RapidXML, http://rapidxml.sourceforge.net/. Both boost::property_tree and RapidXML are maintained by Marcin Kalicinski. I suggest you contact him directly. You can find his mail address on the RapidXML home page. – Johan Råde Jul 04 '11 at 14:53
  • thanks ildjarn for the edit, but the empty lines are here for a reason! Btw question asked to the maintainer, I'll post the answer if there is one – foke Jul 05 '11 at 16:49

4 Answers4

48

The solution was to add the trim_whitespace flag to the call to read_xml:

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>

int main( void )
{
    // Create an empty property tree object
    using boost::property_tree::ptree;
    ptree pt;

    // reading file.xml
    read_xml("file.xml", pt, boost::property_tree::xml_parser::trim_whitespace );

    // writing the unchanged ptree in file2.xml
    boost::property_tree::xml_writer_settings<char> settings('\t', 1);
    write_xml("file2.xml", pt, std::locale(), settings);

    return 0;
}

The flag is documented here but the current maintainer of the library (Sebastien Redl) was kind enough to answer and point me to it.

Null
  • 1,950
  • 9
  • 30
  • 33
foke
  • 1,339
  • 2
  • 12
  • 20
  • 2
    Warning: trim_whitespace not only trims whitespace in the XML, but also whitespace in any element that doesn't contain other elements: `xx ` is read as if it was `xx`. – Andreas Haferburg Nov 18 '14 at 13:50
  • 10
    It is strange that one needs to change the *read* settings to get this (specially after @AndreasHaferburg comment). Anyway in the current version of Boost one needs to use `xml_writer_settings` (not `char`). – alfC Jul 03 '15 at 23:05
  • updated "here" link: http://www.boost.org/doc/libs/1_58_0/doc/html/boost/property_tree/xml_parser/read_xml_idp82929296.html – alfC Jul 04 '15 at 04:28
  • 1
    There's a gotcha with using `trim_whitespace` when reading: it does more than trim leading and trailing whitespace; it also collapses multiple spaces into a single space. For example `BEGINxxxxEND` (where `x` is a space character) gets collapsed to BEGINxEND. This is because internally `trim_whitespace` gets expanded to `parse_normalize_whitespace | parse_trim_whitespace` in xml_parser_read_rapidxml.hpp. We ended up having to hack boost to add a new flag that disables the normalization, because otherwise it broke round-tripping of data in our application. – Chris Kline Dec 13 '16 at 23:40
  • this almost worked for me as is. I used different "settings" to make it work. `const xml_writer_settings< typename Ptree::key_type > settings('\t', 1);`. – J'e Dec 01 '20 at 16:18
4

This question is quite old, but I investigated your problem again, lately, because it got a lot worse now that property_tree translates newlines to

&#10;    

In my opinion this is a bug, because elements, which contains only whitespace - newlines, spaces and tabs, are treated as text elements. trim_whitespace is only a bandaid and normalizes ALL whitespace in the property_tree.

I reported the bug over here and also attached a .diff to fix this behaviour in Boost 1.59 in case trim_whitespace is not used: https://svn.boost.org/trac/boost/ticket/11600

tstrunk
  • 71
  • 3
3

For those trying:

boost::property_tree::xml_writer_settings<char> settings('\t', 1);

Compiling with boost-1.60.0 in VisualStudio 2013 you may get:

vmtknetworktest.cpp(259) : see reference to class template instantiation 'boost::property_tree::xml_parser::xml_writer_settings<char>' being compiled
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(38): error C2039: 'value_type' : is not a member of '`global namespace''
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(38): error C2146: syntax error : missing ';' before identifier 'Ch'
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(38): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(40): error C2061: syntax error : identifier 'Ch'
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(49): error C2146: syntax error : missing ';' before identifier 'indent_char'
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(49): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(50): error C2825: 'Str': must be a class or namespace when followed by '::'
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(50): error C2039: 'size_type' : is not a member of '`global namespace''
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(50): error C2146: syntax error : missing ';' before identifier 'indent_count'
install\include\boost-1_60\boost/property_tree/detail/xml_parser_writer_settings.hpp(50): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
vmtknetworktest.cpp(259): error C2661: 'boost::property_tree::xml_parser::xml_writer_settings<char>::xml_writer_settings' : no overloaded function takes 3 arguments

Then end up here:

https://svn.boost.org/trac/boost/ticket/10272

Solution to found to work is to use std::string in template.

pt::write_xml(file_name, params, std::locale(), pt::xml_writer_make_settings< std::string >(' ', 4));

as described here:

https://stackoverflow.com/a/35043551/7170333

cbuchart
  • 10,847
  • 9
  • 53
  • 93
bitminer
  • 71
  • 5
  • In my case (Boost 1.81.0) I had to type: `boost::property_tree::xml_writer_settings settings(...);` - use `std::string` as template parameter. – Michał Jaroń May 16 '23 at 13:07
0

Settings to 'trim_whitespace' is not the right answer here. Trimming white-space upon read is of no use if you can't trim data items (which happens when you read)..

I think what's wrong is here, in: https://www.boost.org/doc/libs/1_81_0/boost/property_tree/detail/xml_parser_write.hpp

And instead of:

            // Write data text, if present
            if (!pt.data().empty())
                write_xml_text(stream,
                    pt.template get_value<Str>(),
                    indent + 1, has_elements && want_pretty, settings);

there could perhaps be something like:

            // Write data text, if not empty/white-space only
            auto d = pt.data();
            bool is_empty = d.erase(d.find_last_not_of(" \n\r\t")+1).empty();
            if (!is_empty)
                write_xml_text(stream,
                    pt.template get_value<Str>(),
                    indent + 1, has_elements && want_pretty, settings);

And that seems to fix the 'weird' behaviour seen above IMO - no empty or 'space-only' new-lines are added by the writer.

Alternatively - the reader could skip this empty 'text' when reading nodes, e.g.:

       // Parse contents of the node - children, data etc.
        template<int Flags>
        void parse_node_contents(Ch *&text, xml_node<Ch> *node)
        {
            // For all children and text
            while (1)
            {
                // Skip whitespace between > and node contents
                Ch *contents_start = text;      // Store start of node contents before whitespace is skipped
                // ****--***-> here - unconditonally (always) skip the WS
                // if (Flags & parse_trim_whitespace)
                    skip<whitespace_pred, Flags>(text);
                Ch next_char = *text;
formiaczek
  • 385
  • 3
  • 7