7

I am performing a search in an XML file, using the following code:

$result = $xml->xpath("//StopPoint[contains(StopName, '$query')]");

Where $query is the search query, and StopName is the name of a bus stop. The problem is, it's case sensitive.

And not only that, I would also be able to search with non-english characters like ÆØÅæøå to return Norwegian names.

How is this possible?

rebellion
  • 6,628
  • 11
  • 48
  • 79
  • For those looking for a solution to this problem, here is an article that discusses an alternative approach: http://codingexplained.com/coding/php/solving-xpath-case-sensitivity-with-php – ba0708 Apr 27 '13 at 19:58

4 Answers4

12

In XPath 1.0 (which is, I believe, the best you can get with PHP SimpleXML), you'd have to use the translate() function to produce all-lowercase output from mixed-case input.

For convenience, I would wrap it in a function like this:

function findStopPointByName($xml, $query) {
  $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"; // add any characters...
  $lower = "abcdefghijklmnopqrstuvwxyzæøå"; // ...that are missing

  $arg_stopname = "translate(StopName, '$upper', '$lower')";
  $arg_query    = "translate('$query', '$upper', '$lower')";

  return $xml->xpath("//StopPoint[contains($arg_stopname, $arg_query)");
}

As a sanitizing measure I would either completely forbid or escape single quotes in $query, because they will break your XPath string if they are ignored.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
10

In XPath 2.0 you can use lower-case() function, which is unicode aware, so it'll handle non-ASCII characters fine.

contains(lower-case(StopName), lower-case('$query'))

To access XPath 2.0 you need XSLT 2.0 parser. For example SAXON. You can access it from PHP via JavaBridge.

vartec
  • 131,205
  • 36
  • 218
  • 244
3

Non-English names should not be a problem. Just add them to your XPath. (XML is defined as using Unicode).

As for case-insensitivity, ...

XPath 1.0 includes the following statement:

Two strings are equal if and only if they consist of the same sequence of UCS characters.

So even using explicit predicates on the local-name will not help.

XPath 2 includes functions to map case. E.g. fn:upper-case


Additional: using XPath's translate function should allow case mapping to be faked in XPath 1, but the input will need to include every cased code point you and your users will ever need:

"test" = translate($inputString, "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
Richard
  • 106,783
  • 21
  • 203
  • 265
  • As I commented below, PHP tells me that the function lower-case and upper-case can't be found.. :/ – rebellion Mar 09 '09 at 12:38
  • @termserv: XML is *always* unicode. Even if your XML files are not in a Unicode-capable encoding, once in memory this will make no difference. – Richard Mar 09 '09 at 13:32
  • @Richard: An up-vote for the answer you took the "translate()" idea from would have been fair. – Tomalak Mar 09 '09 at 13:47
  • @Tomalak: I forgot, sorry, but asking for an up-vote pretty much negates it. – Richard Mar 09 '09 at 14:14
  • I know. ;-) It's also not that I would desperately need it (in fact, if you had simply credited me without up-voting it would have been okay). Maybe I should have made a smiley right away, as it wasn't meant to be aggressive or anything. – Tomalak Mar 09 '09 at 14:43
  • Your translate clause is not quite right, your alphabet is little screwed up. `translate(..,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')` – Andrew Mar 18 '13 at 14:52
  • @Andrew The order was the same (so would work), but correctly in alphabetical order is better.... – Richard Mar 19 '13 at 12:48
0

In addition:

$xml->xpath("//StopPoint[contains(StopName, '$query')]");

You will need to strip out any apostrophe characters from $query to avoid breaking your expression.

In XPath 2.0 you can double-up the quote being used in the delimiter to put that quote into a string literal, but in XPath 1.0 it's impossible to include the delimiter in the string.

bobince
  • 528,062
  • 107
  • 651
  • 834