Use xPath or Regex?

Question

The two methods below each serve the same purpose: scan the content of the post and determine if at least one img tag has an alt attribute which contains the "keyword" which is being tested for.

I'm new to xPath and would prefer to use it depending on how expensive that approach is compared to the regex version...

Method #1 uses preg_match

function image_alt_text_has_keyword($post)
        {
            $theKeyword = trim(wpe_getKeyword($post));
            $theContent = $post->post_content;
            $myArrayVar = array();
            preg_match_all('/<img\s[^>]*alt=\"([^\"]*)\"[^>]*>/siU',$theContent,$myArrayVar);
            foreach ($myArrayVar[1] as $theValue)
            {
                if (keyword_in_content($theKeyword,$theValue)) return true;
            }
            return false;
        }

function keyword_in_content($theKeyword, $theContent)
        {
            return preg_match('/\b' . $theKeyword . '\b/i', $theContent);
        }

Method #2 uses xPath

function keyword_in_img_alt()
{
global $post;
$keyword = trim(strtolower(wpe_getKeyword($post)));
$dom = new DOMDocument;
$dom->loadHTML(strtolower($post->post_content));
$xPath = new DOMXPath($dom);
return $xPath->evaluate('count(//a[.//img[contains(@alt, "'.$keyword.'")]])');
}

"constains"? I think you have a typo. – Mark Byers Oct 30 '10 at 17:30 — Mark Byers, Oct 30 '10 at 17:30

Mark Byers · Accepted Answer · 2010-10-30T17:55:57.713

14

If you are parsing XML you should use XPath as it was designed exactly for this purpose. XML / XHTML is not a regular language and cannot be parsed correctly by regular expressions. You may be able to write a regular expression which works some of the time but there will be special cases where it will fail.

edited Oct 30 '10 at 17:55

answered Oct 30 '10 at 17:28

Mark Byers

811,555
193
1,581
1,452

3

"XPath is used to navigate through elements and attributes in an XML document." From the horses mouth (W3C). – john mossel Oct 30 '10 at 17:31
2

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Mads Hansen Oct 30 '10 at 17:31
+1 Using regex on XML is like using a screwdriver to cut down a tree. Using XPath on XML is like using a chainsaw to cut the tree down. Both are useful, but neither can replace the other. – Oct 30 '10 at 17:33

score 4 · Answer 2 · edited May 23 '17 at 10:30

4

Using RegEx for selecting nodes in an XML document is as appropriate as using it for finding if a given number is a prime.

The fact that this is possible doesn't make it even a bit appropriate.

What is more, XPath 2.0 has RegEx support while RegEx do not have XPath support. Therefore, if both are needed, it is probably best to use XPath 2.0

edited May 23 '17 at 10:30

Community

1
1

answered Oct 30 '10 at 17:45

Dimitre Novatchev

240,661
26
293
431

*(sidenote)* The OP's example code suggests a PHP environment. PHP's DOM extension uses libxml. libxml does not support XPath 2.0. But PHP's DOM extension supports using any PHP function inside the XPath, including Regular Expressions. So while your answer is perfectly correct from a language agnostic POV, it would have to read PHP's DOMXPath implementation has RegEx support. That still leads to the same conclusion of course :) – Gordon Nov 05 '10 at 23:14

Use xPath or Regex?

2 Answers2

Linked