Should I use XPath or just DOM?

Question

I have a bunch of hierarchical data stored in an XML file. I am wrapping that up behind hand-crafted classes using TinyXML. Given an XML fragment that describes a source signature as a set of (frequency, level) pairs a bit like this:

<source>
  <sig><freq>1000</freq><level>100</level><sig>
  <sig><freq>1200</freq><level>110</level><sig>
</source>

i am extracting the pairs with this:

std::vector< std::pair<double, double> > signature() const
{
    std::vector< std::pair<double, double> > sig;
    for (const TiXmlElement* sig_el = node()->FirstChildElement ("sig");
        sig_el;
        sig_el = sig_el->NextSiblingElement("sig"))
    {
        const double level = boost::lexical_cast<double> (sig_el->FirstChildElement("level")->GetText());
        const double freq =  boost::lexical_cast<double> (sig_el->FirstChildElement("freq")->GetText());
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

where node() is pointing at the <source> node.

Question: would I get a neater, more elegant, more maintainable or in any other way better piece of code using an XPath library instead?

Update: I have tried it using TinyXPath two ways. Neither of them actually work, which is a big point against them obviously. Am I doing something fundamentally wrong? If this is what it is going to look like with XPath, I don't think it is getting me anything.

std::vector< std::pair<double, double> > signature2() const
{
    std::vector< std::pair<double, double> > sig;
    TinyXPath::xpath_processor source_proc (node(), "sig");
    const unsigned n_nodes = source_proc.u_compute_xpath_node_set();
    for (unsigned i = 0; i != n_nodes; ++i)
    {
        TiXmlNode* s = source_proc.XNp_get_xpath_node (i);
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

std::vector< std::pair<double, double> > signature3() const
{
    std::vector< std::pair<double, double> > sig;
    int i = 1;
    while (TiXmlNode* s = TinyXPath::xpath_processor (node(), 
        ("sig[" + boost::lexical_cast<std::string>(i++) + "]/*").c_str()).
        XNp_get_xpath_node(0))
    {
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

As a secondary issue, if so, which XPath library should I be using?

score 5 · Accepted Answer · edited Nov 07 '15 at 16:18

In general I tend to prefer XPath based solutions for their concision and versatility but, honestly, in your case, I don't think using XPath will bring a lot to your signature.

Here is why:

Code elegance
Your code is nice and compact and it will not get any better with an XPath expression.

Memory footprint
Unless your input XML configuration file is huge (a kind of oxymoron) and the DOM parsing would entail a large memory footprint, for which there is no proof that using XPath would be a decisive cure, I would stick with DOM.

Execution Speed
On such a simple XML tree, execution speed should be comparable. If there would be a difference, it would probably be in TinyXml's advantage because of the collocation of the freq and level tags under a given node.

Libraries and external references That's the decisive point.
The leading XPath engine in the C++ world is XQilla. It supports XQuery (therefore both XPath 1.0 and 2.0) and is backed by Oracle because it's developed by the group responsible for Berkeley DB products (including precisely Berkeley DB XML – which uses XQilla).
The problem for C++ developers wishing to use XQilla is that they have several alternatives

use Xerces 2 and XQilla 2.1 litter your code with casts.
use XQilla 2.2+ and use Xerces 3 (no casts needed here)
use TinyXPath nicely integrated with TinyXml but for which there however are a number of limitations (no support for namespaces for instance)
mix Xerces and tinyXml

In summary, in your case switching to XPath just for the sake of it, would bring little benefit if any.

Yet, XPath is a very powerful tool in today's developer toolbox and no one can ignore it. If you just wish to practice on a simple example, yours is as good as any. Then, I'd keep in mind the points above and probably use TinyXPath anyway.

score 3 · Answer 2 · answered Mar 05 '11 at 07:55

You need XPath if you need the flexibility to make runtime changes to the values extracted.

But, if you're unlikely to need this kind of flexibility, or a recompile to expand what you're extracting isn't a problem and things are not being changed to often or if users never need to update the expressions. Or if what you have works fine for you, you don't need XPath and there are lots of applications that don't use it.

As to whether it's more readable, well yes it sure can be. But if you're just pulling out a few values I'd question the need to pull in another library.

I would certainly document what you currently have a bit better as those not familiar with tinyxml or xml libraries may not be sure what it's doing but it's not hard to understand as it is.

I'm not sure what sort of overhead XPath adds, but I suspect it may add some. For most, I guess they won't notice any difference at all and it may not be a concern to you or most people, but be aware of it in case it's something you're concerned about.

If you do want to use an xpath library then all I can say is that I've used the one that came with Xerces-C++ and it wasn't too hard to learn. I have used TinyXML before and someone here has mentioned TinyXPath. I have no experience with it but it's available.

I also found this link useful when first learning about XPath expressions. http://www.w3schools.com/xpath/default.asp

score 1 · Answer 3 · answered Mar 04 '11 at 14:31

XPath was made for this, so of course your code will be "better" if you use it.

I can't recommend a specific c++ XPath library, but even though using one will be the correct decision most of the time, do a cost/benefit analysis before adding one. Maybe YAGNI.

score 1 · Answer 4 · answered Mar 04 '11 at 18:42

1

This XPath expression:

/*/sig[$pN]/*

selects all children elements (just the pair freq and level) of the $pN-th sig child of the top element of the XML document.

The string $pN should be substituted with a specific positive integer, for example:

/*/sig[2]/*

selects these two elements:

<freq>1200</freq><level>110</level>

Using an XPath expression as this is obviously much shorter and understandable that the provided C++ code.

Another advantage is that the same XPath expression can be used from a C# or Java or ... program, without having to modify it in any way -- thus adhering to XPath results in very high degree of portability.

answered Mar 04 '11 at 18:42

Dimitre Novatchev

240,661
26
293
431

I can see how the XPAth notation makes it neat to describe a node, but when that's wrapped up into a C++ API the benefit becomes much less clear. – Pete Mar 07 '11 at 11:36
2

@Pete: Why are you telling me you're not going to wrap all that C++ code into a function? So you will only have `someObj.SelectNode('Expression')` -- good software engineering principles are language-independent. – Dimitre Novatchev Mar 07 '11 at 13:49
@Dimitre Novatchev: I'm not sure what you are getting at. I'm trying to wrap it into the function given in the original question, is that going to be neater with XPath than without? Due to my lack of XPath experience, in your `SomeType something = someObj.SelectNode ("Expression")`, I can't see what SomeType, someObj or Exptression are going to be that helps. – Pete Mar 07 '11 at 14:17
1

@Pete: Why don't you read you documentation about the SelectNodes() and SelectSingleNode() methods? This would give you a good idea. Also, if these two methods are already implemented, why would you need at all to re-implement them? – Dimitre Novatchev Mar 07 '11 at 14:28
1

@Dimitre: I think I see the confusion: TinyXPath doesn't have a SelectNodes method. Where should I be looking for documentation on these methods? I found some that appears to apply to a JScript API Anyway, if I get nodes, aren't I back to using DOM methods to get the content out of them? – Pete Mar 07 '11 at 14:40
@Pete: This is the MSDN documentation (click on the "C++" tab): http://msdn.microsoft.com/en-us/library/hcebdtae.aspx – Dimitre Novatchev Mar 07 '11 at 14:50
I see. If only I was using .Net. Thanks though, that's helped to clear up what I should be doing with the library I am using. – Pete Mar 07 '11 at 15:32
@Pete: Exactly. I think this is going to help in your design. – Dimitre Novatchev Mar 07 '11 at 18:19

Should I use XPath or just DOM?

4 Answers4

Linked