Why boost property tree xml serializer cannot preserve multi-line values?

Question

I am working with boost property tree (v.1.72.5) to read and write xml files. I know that according to the documentation:

The XML storage encoding does not round-trip perfectly. A read-write cycle loses trimmed whitespace, low-level formatting information, and the distinction between normal data and CDATA nodes. Comments are only preserved when enabled. A write-read cycle loses trimmed whitespace; that is, if the origin tree has string data that starts or ends with whitespace, that whitespace is lost.

According to this known behavior, if I read/write an xml file without using the boost::property_tree::xml_parser::trim_whitespace option, my xml file can be like this:

  <MyInfo id="info_1" title="" description="" >
     
     
     
     234
     <My_Nums>
        <My_Num>0</My_Num>
        <My_Num>1</My_Num>
     </My_Nums>
  </MyInfo>

In this case, reading the value of MyInfo (i.e. 234) is failing. But if I use the boost::property_tree::xml_parser::trim_whitespace option, I can read the value properly (i.e. 234), but all of my multi-line strings are converted into single line values (line breaks are removed.)

How I can read the value of MyInfo tag properly and at the same time preserve my multi line values?

"Why" - because that's what the specs say. "How to fix" - use an XML library. https://stackoverflow.com/questions/9387610/what-xml-parser-should-i-use-in-c/9387612#9387612 Boost Property Tree is not an XML library. — sehe, Jun 15 '21 at 11:49
@sehe "because that's what the specs say." -> But I cannot find anywhere in the spec that if we have some whitespaces in the xml file, then reading some values (e.g. 234 in the above example) will fail. — TonySalimi, Jun 15 '21 at 13:13
You quoted the bits that promise that the trimmed whitespace will be lost. — sehe, Jun 15 '21 at 15:12
@sehe Yes, the trimmed whitespaces will be lost according to the spec, but nothing about the fact that this behavior may lead to an error while reading a value! — TonySalimi, Jun 15 '21 at 15:39
I think you're having a bit too much confidence in a particulat very narrow interpretation of that text. I'd read it as not promising anything that it doesn't explicitly promise, rather than magically promising things that aren't excluded. You can argue that the documentation can be clearer, but really, you already know how the library behaves. The fundamental observation is still: it behaves this way because you're using a _property tree library_ to read XML. It's not going to do what you expect from XML. Because that's not what it's for. — sehe, Jun 15 '21 at 20:19

score 0 · Accepted Answer · answered Jul 07 '21 at 09:56

As @sehe has mentioned in the comments, it is how the boost property tree has been implemented. I think the best workaround to avoid this problem is to always use the boost::property_tree::xml_parser::trim_whitespace attribute and also assign the values to an specific tag. With this apprach, my xml file would always be like that:

  <MyInfo id="info_1" title="" description="" >
     <value>234</value>
     <My_Nums>
        <My_Num>0</My_Num>
        <My_Num>1</My_Num>
     </My_Nums>
  </MyInfo>

Why boost property tree xml serializer cannot preserve multi-line values?

1 Answers1