0

I have a XML file might contain thousands of nodes like below.

<?xml version="1.0" encoding="utf-8"?>
<root>
    <node>
        <id>0</id>
        <value>data</value>
    </node>
    <node>
        <id>1</id>
        <value>data</value>
    </node>
    <node>
        <id>2</id>
        <value>data</value>
    </node>
</<root>

Each node has its "id" which guarantees be not duplicated to each other and possibly there are gaps between them. My question is what is most efficient way by using boost to search a certain node by using node's id?

Thanks.

UPDATE This is the way I did

property_tree::wptree   ptree; // has been loaded somewhere

auto nodes = ptree.get_child(L"root");

bool bFound = false;

for (auto& itr : nodes)
{
    auto & rec = itr.second;
    int id = rec.get<int>(L"id", -1);
    if (id == nodeId)
    {
        bFound = true;
        // found it 
        //  get other values
        break;
    }
}

I do not think scanning whole the file to find an item is efficient.

SteveH
  • 233
  • 5
  • 15
  • Please post what you have tried so far (C++ code) and explain why it is not "efficient". – Jim Garrison May 02 '16 at 07:09
  • @JimGarrison Of course I did it before I post the question here. You don't need to vote down just because someone didn't show out a trivial thing on his question. – SteveH May 02 '16 at 08:11
  • Depending on the size of your XML you might not have a choice. E.g. at our place we encounter multi-gigabyte XMLs and you simply can't afford to hold them all in memory. – sehe May 02 '16 at 10:16

1 Answers1

2

I've used LibXml2 with the TextReader interface.

You can continue doing xmlReaderRead and checking whether a pattern matches (xmlPatterncompile and xmlPatternMatch to see when a node is matched.

You can even get the full "DOM subtree" at that point, so you have the best of both worlds.

CAVEAT: libxml++'s wrapper of xmlReaderExpand() and similar accessors¹ are documented to leak memory. I've fixed this in our local code base. I might publish a librarified version of that on github given enough interest and permission.

All in all, this nets the same kind of interface as .NET's XpathReader: What ever happened to XPathReader


¹ so TextReader::expand()

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for recommendation. But for some reasons, I have to stick to boost. – SteveH May 02 '16 at 11:56
  • Sure. Just realize that Boost doesn't have a true XML parser. It has a Property Tree library and it's not optimized in any way for the use case you describe – sehe May 02 '16 at 12:18