1

I decided to check memory usage of PropertyTree for XML parsing with this piece of code. The XML has something over 120M, but this program was consuming over 2G when I decided to kill it. Is this standard consumption of PropertyTree or there is something wrong?

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <boost/foreach.hpp>
#include <iostream>

int main()
{
  using boost::property_tree::ptree;
  ptree pt;
  read_xml("c.xml",pt);
  return 0;
}
user965748
  • 2,227
  • 4
  • 22
  • 30
  • 1
    Do you have a sample XML? (Without further ado: I'd guess it's normal. Boost Property Tree is **not an XML library**. It's a **property tree** library. This means it will be "good enough" for config-file like applications) – sehe Aug 23 '15 at 02:28
  • The sample is shown here http://stackoverflow.com/questions/29223415/working-with-a-forest-of-binary-trees-stored-in-a-large-xml-file-php – user965748 Aug 23 '15 at 10:13
  • That's not 120MB of XML. – Puppy Aug 23 '15 at 14:17

1 Answers1

3

Running your exact snippet compiled with Gcc 4.8 on 64-bit linux, and using the 117MiB input xml here, I get peak memory usage of 2.1 GiB:

enter image description here

The whole thing executes in ~4-14s depending on optimization flags. Using tcmalloc we get 2.7s even.

You can see that at least 50% of the memory is directly in the ptree containers. In your PHP question you (correcly) mentioned that reading it all into a single DOM is just not such a great idea.

Even so, if you use a more appropriate/capable library, like PugiXML, the execution is over 10x as fast and the memory usage is roughly 1/6th:

enter image description here

Here's the code:

#include <pugixml.hpp>
#include <iostream>

int main() {
    pugi::xml_document doc;
    doc.load_file("input.xml");
}

Imagine what happens if you optimize for memory usage by using a streaming API.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    That's right, when I tried RapidXML, I got around 3.7 x size, with PugiXML it was 2.75x. Boost isn't really appropriate for XML parsing. – user965748 Aug 24 '15 at 02:37