this is useful, because I can then do for example this:
xPath->query('//div.class');
So I need regex which do this transforms:
Example 1
text().some_class => text()[contains(concat(" ", @class, " "), " some_class ")]
Example 2: nothing to do – it's in apostrophes
@src = 'obr.gif' => @src = 'obr.gif'
Example 3
*.class => *[contains(concat(" ", @class, " "), " class ")]
Example 4
div.class => div[contains(concat(" ", @class, " "), " class ")]
Example 5: do nothing – missing subject, which should have this class (I know, this is not valid xpath)
div[.neco] => div[.neco]
I used PHP preg_replace this way:
preg_replace(
'/\.([a-z_][\w-]*)/i',
'[contains(concat(" ", @class, " "), " $1 ")]',
$xPath);
That only worked for examples No. 1, 3 and 4. So I updated it:
preg_replace(
'/(?<=[\w*\])])\.([a-z_][\w-]*)/i',
'[contains(concat(" ", @class, " "), " $1 ")]',
$xPath);
Then only No 2 didn't work. I tried this:
preg_replace(
'/(\'[^\']+\'.*?)*(?<=[\w*\])])\.([a-z_][\w-]*)/i',
'$1[contains(concat(" ", @class, " "), " $2 ")]',
$xPath);
That works for:
//div[@src = 'obr.gif'].class => //div[@src = 'obr.gif'][contains(concat(" ", @class, " "), " class ")]
But for (No 2) that do it wrong:
@src = 'obr.gif' => @src = 'obr[contains(concat(" ", @class, " "), " gif ")]'
I realize that PHP tries hard to match at least something, so "ignore" first parentheses, but I don't know, how to make regex which would works according to me.
PS: I'm only using single quotes in xPath expression, thus I do not care about quotes.
EDIT: Modified funkwurm answer for PHP
preg_replace_callback(<<<'CLASS'
/('|").*?(?<!\\)\1|(?<=[\w*\])])\.([a-z_][\w-]*)/i
CLASS
, function($matches) {
return $matches[1] ? $matches[0] : "[contains(concat(\" \", @class, \" \"), \" $matches[2] \")]";
},
$xPath
);
I'm using nowdoc syntax for regex entry, because then I don't have to deal with escaping in quoted strings.