2

What is the best way to escape a var given to xpath.

$test = simplexml_load_file('test.xml');
$var = $_GET['var']; // injection heaven
$result = $test->xpath('/catalog/items/item[title="'.$var.'"]');

Normally I use PDO binding. OR stuff like that but they all require a database connection. Is it enough to just addslashes and htmlentities.
Or is there a better to do this?

janw
  • 6,672
  • 6
  • 26
  • 45

4 Answers4

3

you can't really make a general xpath escape function, but you can make an XPath quote function, which can be used like

$result = $test->xpath('/catalog/items/item[title='.xpath_quote($var).']');

implementation:

//based on https://stackoverflow.com/a/1352556/1067003
function xpath_quote(string $value):string{
    if(false===strpos($value,'"')){
        return '"'.$value.'"';
    }
    if(false===strpos($value,'\'')){
        return '\''.$value.'\'';
    }
    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("'foo'", '"', "bar")
    $sb='concat(';
    $substrings=explode('"',$value);
    for($i=0;$i<count($substrings);++$i){
        $needComma=($i>0);
        if($substrings[$i]!==''){
            if($i>0){
                $sb.=', ';
            }
            $sb.='"'.$substrings[$i].'"';
            $needComma=true;
        }
        if($i < (count($substrings) -1)){
            if($needComma){
                $sb.=', ';
            }
            $sb.="'\"'";
        }
    }
    $sb.=')';
    return $sb;
}

and it's based on the C# xpath quote function from https://stackoverflow.com/a/1352556/1067003

Is it enough to just addslashes and htmlentities. Or is there a better to do this?

i would be sleeping better at night by using a proper xpath quote function, rather than addslashes/htmlentities, but i don't really know if those technically are sufficient or not.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
2

According to the XPath 1.0 spec, the syntax for literals is as follows:

[29]    Literal    ::=      '"' [^"]* '"'   
                          | "'" [^']* "'"

Which means that in a single-quoted string, anything other than a single quote is allowed. In a double-quoted string, anything other than a double quote is allowed.

Sam Dufel
  • 17,560
  • 3
  • 48
  • 51
  • just checking, but if I would want to use quotes in the node values they also need to be html characters right? – janw Jun 25 '14 at 20:41
  • Yes, that's what the `htmlspecialchars()` call is for – Sam Dufel Jun 25 '14 at 20:43
  • Okay, I guess that could work. Is there a way to do that while still being able to select actual quotes? – janw Jun 26 '14 at 10:37
  • That should select actual quotes... You can also just convert all quotes in your selector to single quotes and then wrap them all in double quotes – Sam Dufel Jun 26 '14 at 13:45
  • 1
    I had a look at the XPath 1.0 spec (https://www.w3.org/TR/1999/REC-xpath-19991116/#exprlex) and found the part you quoted. However, I cannot find any reference to html entities being decoded inside string literals. Do you have a reference for that? Or is that perhaps just an extension used by some literals? Section 3.6 references the "Character" definition from the XML spec and the "Character normalization" definition from the W3C character model spec, but neither of them specify HTML entity decoding (though the character model spec does recommend to have *some* form of escaping/encoding). – Matthijs Kooijman Apr 30 '19 at 11:12
  • Looking more closely, I wonder if the html entity escaping works at all? I just tried with both simplexml and DomDocument in PHP (7.2), and neither of them seems to work. For simplexml, I used this testcase: `$root = simplexml_load_string(''); print_r($root->xpath("/root[@attr='foo\"bar']")); /* works */ print_r($root->xpath('/root[@attr="foo"bar"]')); /* does not work */` – Matthijs Kooijman Apr 30 '19 at 11:35
1

The above answers are for XPath 1.0, which is the only version PHP supports. For completeness, I'll note that starting with XPath 2.0, string literals can contain quotes by doubling them:

[74]        StringLiteral      ::=      ('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'")
[75]        EscapeQuot     ::=      '""'
[76]        EscapeApos     ::=      "''"

e.g. to search for the title Some "quoted" title, you would use the following xpath:

/catalog/items/item[title="Some ""quoted"" title"]

This could be implemented with a simple string escape (but I won't give an example, since you're using PHP and as mentioned PHP does not support XPath 2.0).

Matthijs Kooijman
  • 2,498
  • 23
  • 30
1

This answer is a supplemental to hanshenrik's answer, as I liked the general solution, but found the example function to be hard to read and not optimal regarding its results. It does it's job perfectly fine nonetheless.

About XPath quoting

XPath 1.0 allows any characters inside their literals except the quotes used to quote the literal. Allowed quotes are " and ', so quoting literals that contain at most one of those quotes is trivial. But to quote string with both you need to quote them in different strings and concatenate them with XPath's concat():

He's telling you "Hello world!".

would need to be escaped like

concat("He's telling", ' you "Hello world!".')

It is of course irrelevant where in between the 'and " you split the literal.

Differences of Implementations

hanshenrik's implementation creates the quoted literal by extracting all parts that aren't double quotes and then inserting quoted double quotes. But that can produce undesirable results:

"""x'x"x""xx

would be escaped by their function like

concat('"', '"', '"', "x'x", '"', "x", '"', '"', "xx")

and the example from above:

concat("He's telling you ", '"', "Hello world!", '"', ".")

This implementation on the other side minimizes the amount of partial literals by alternating the quote and then quoting as much as possible:

for the first example:

concat("He's telling you ", '"Hello world!".')

and for the second example:

concat('"""x', "'x", '"x""xx')

Implementation

/**
 * Creates a properly quoted xpath 1.0 string literal. It prefers double quotes over
 * single quotes. If both kinds of quotes are used in the literal then it will create a
 * compound expression with concat(), using as few partial strings as possible.
 *
 * Based on {@link https://stackoverflow.com/a/54436185/6229450 hanshenrik's StackOverflow answer}.
 *
 * @param string $literal unquoted literal to use in xpath expression
 * @return string quoted xpath literal for xpath 1.0
 */
public static function quoteXPathLiteral(string $literal): string
{
    $firstDoubleQuote = strpos($literal, '"');
    if ($firstDoubleQuote === false) {
        return '"' . $literal . '"';
    }
    $firstSingleQuote = strpos($literal, '\'');
    if ($firstSingleQuote === false) {
        return '\'' . $literal . '\'';
    }
    $currentQuote = $firstDoubleQuote > $firstSingleQuote ? '"' : '\'';
    $quoted = [];
    $lastCut = 0;
    // cut into largest possible parts that contain exactly one kind of quote 
    while (($nextCut = strpos($literal, $currentQuote, $lastCut))) {
        $quotablePart = substr($literal, $lastCut, $nextCut - $lastCut);
        $quoted[] = $currentQuote . $quotablePart . $currentQuote;
        $currentQuote = $currentQuote === '"' ? '\'' : '"'; // toggle quote
        $lastCut = $nextCut;
    }
    $quoted[] = $currentQuote . substr($literal, $lastCut) . $currentQuote;
    return 'concat(' . implode(',', $quoted) . ')';
}
seyfahni
  • 161
  • 3
  • 8