11

I am making an add-on for firefox and it loads a html page using ajax (add-on has it's XUL panel).

Now at this point, i did not search for a ways of creating a document object and placing the ajax request contents into it and then using xPath to find what i need.
Instead i am loading the contents and parsing it as text with regular expresion.

But i got a question. Which would be better to use, xPath or regular expression? Which is faster to perform?

The HTML page would consist of hundreds of elements which contain same text, and what i basically want to do is count how many elements are there.

I want my add-on to work as fast as possible and i do not know the mechanics behind regexp or xPath, so i don't know which is more effective.

Hope i was clear. Thanks

holographic-principle
  • 19,688
  • 10
  • 46
  • 62
user1651105
  • 1,727
  • 4
  • 25
  • 45
  • 6
    Obligatory link: [**Do not use regex**](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Amarghosh Aug 04 '10 at 13:57
  • Neither is inherently faster than the other - it all depends on their implementations. – Jeff Yates Aug 04 '10 at 14:03
  • 1
    Just wondering, why do some people consider this "not a real question"? Asking for what type of approach is best (or fastest) for a typical programming task seems to me like a genuine question to ask at SO (imo). – Abel Aug 04 '10 at 14:11
  • @Abel - when asking on performance you need to show what is the requirement - ie if one takes 100ms and on 101ms then it does not matter which is faster – mmmmmm Jul 15 '13 at 12:25

1 Answers1

19

Whenever you are dealing with XML, use XPath (or XSLT, XQuery, SAX, DOM or any other XML-aware method to go through your data). Do never use regular expressions for this task.

Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.

Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.

If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).

You wrote:

"which is more effective"*

the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox and other browsers as well if you need your code to run from a browser.

Community
  • 1
  • 1
Abel
  • 56,041
  • 24
  • 146
  • 247
  • Thanks for reply. Now i have another newbie question. Would you happen to know how to create a new HTML or XML document object inside the Firefox add-on's XUL? As document.evaluate work only with XML and HTML and NOT XUL. I need to somehow put the AJAX response text to DOM document to be able to use xPath on it. I have spent 40 mins searching for this but still failed to find. I know i could load the contents into a new tab and acces it there, but that is not what i want to do. Thanks. (not sure if i had to create a new question instead of asking in comment here) – user1651105 Aug 04 '10 at 14:39
  • 1
    @aleluja: You should ask again for your new question. –  Aug 04 '10 at 14:54
  • great answer, just one more thing to add: in fact the latest xpath technology outperforms regular expressions. – vtd-xml-author Jan 23 '11 at 22:34
  • I might be misinterpreting the question, but if he just wants an /approximate/ number of a number of elements, all of which contain the same text, then regular expressions are fine. In fact, if you know they have the exact same attributes, pure strings should be fine too. – yingted Dec 31 '11 at 00:21
  • @Anonymous: not even in that simplified scenario would I ever suggest using regex. Closed vs non-closed elements, in a comment/cdata section, the simple fact that matching brackets or quotes is relatively hard to accomplish with regexes: no, no, no, it's a trap, don't fall into it. Even when they have the "exact same attributes" you never know whether these same attributes are formatted equally, in the same order, commented out etc. Save yourself a lot of time and use simple but effective proper tools (XPath is one, takes only a few minutes to "count" a document). – Abel Dec 31 '11 at 07:10