Case insensitive XPath contains() possible?

Question

I'm running over all textnodes of my DOM and check if the nodeValue contains a certain string.

/html/body//text()[contains(.,'test')]

This is case sensitive. However, I also want to catch Test, TEST or TesT. Is that possible with XPath (in JavaScript)?

Tomalak · Accepted Answer · 2020-01-07T15:06:57.053

This is for XPath 1.0. If your environment supports XPath 2.0, see here.

Yes. Possible, but not beautiful.

/html/body//text()[
  contains(
    translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
    'test'
  )
]

This would work for search strings where the alphabet is known beforehand. Add any accented characters you expect to see.

If you can, mark the text that interests you with some other means, like enclosing it in a <span> that has a certain class while building the HTML. Such things are much easier to locate with XPath than substrings in the element text.

If that's not an option, you can let JavaScript (or any other host language that you are using to execute XPath) help you with building an dynamic XPath expression:

function xpathPrepare(xpath, searchString) {
  return xpath.replace("$u", searchString.toUpperCase())
              .replace("$l", searchString.toLowerCase())
              .replace("$s", searchString.toLowerCase());
}

xp = xpathPrepare("//text()[contains(translate(., '$u', '$l'), '$s')]", "Test");
// -> "//text()[contains(translate(., 'TEST', 'test'), 'test')]"

^{(Hat tip to @KirillPolishchuk's answer - of course you only need to translate those characters you're actually searching for.)}

This approach would work for any search string whatsoever, without requiring prior knowledge of the alphabet, which is a big plus.

Both of the methods above fail when search strings can contain single quotes, in which case things get more complicated.

Thanks! Also the addition is nice, translating only the needed chars. I'd be curious what the performance win is. Note that xpathPrepare() could handle more-than-once appearing chars differently (e.g. you get TEEEEEST and teeeeest). — Aron Woost, Dec 12 '11 at 13:37
@AronWoost: Well, there might be some gain, just benchmark it if you are eager to find out. `translate()` itself does not care how often you repeat each character - `translate(., 'EE', 'ee')` is absolutely equivalent to `translate(., 'E', 'e')`. *P.S.: Don't forget to up-vote @KirillPolishchuk, the idea was his.* — Tomalak, Dec 12 '11 at 14:19
System.Xml.XmlNodeList x = mydoc.SelectNodes("//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜÉÈÊÀÁÂÒÓÔÙÚÛÇÅÏÕÑŒ', 'abcdefghijklmnopqrstuvwxyzäöüéèêàáâòóôùúûçåïõñœ'),'foo')]"); — Stefan Steiger, Nov 29 '13 at 09:34
No. See the *"of course you only need to translate those characters you're actually searching for"* part. — Tomalak, Nov 29 '13 at 10:10

kjhughes · Answer 2 · 2023-07-16T13:12:30.327

74

Modern XPath 2.0 (and higher) Solutions

Use lower-case():

/html/body//text()[contains(lower-case(.),'test')]
Use matches() regex matching with its case-insensitive flag:

/html/body//text()[matches(.,'test', 'i')]

_{For older XPath-1.0-limited environments, see the translate() technique described in @Tomalak's answer.}

edited Jul 16 '23 at 13:12

answered Apr 30 '14 at 13:07

kjhughes

106,133
27
181
240

1

Is this syntax not supported in Firefox and Chrome? I just tried it in the console and they both return syntax error. – d-b Jun 08 '19 at 11:51
8

Firefox and Chrome only implement XPath 1.0. – kjhughes Aug 07 '19 at 12:17
where I can verify that this will work as expected? – Ankit Gupta Oct 13 '20 at 18:04
@AnkitGupta: Any online or offline tool that supports XPath 2.0 can be used to verify this answer, of course, but (1) tool recommendations are off-topic here on SO and (2) given the 56 upvotes, 0 downvotes, and no dissenting comments in over six years, you can be pretty confident that this answer is correct. ;-) – kjhughes Oct 13 '20 at 18:45

Kirill Polishchuk · Answer 3 · 2021-09-08T07:07:39.370

68

Case-insensitive contains

/html/body//text()[contains(translate(., 'EST', 'est'), 'test')]

edited Sep 08 '21 at 07:07

answered Dec 12 '11 at 12:49

Kirill Polishchuk

54,804
11
122
125

4

+1 Absolutely. That's something I did not think of. *(I'll use that in my answer, this is much better than the original JavaScript routine I wrote)* – Tomalak Dec 12 '11 at 13:02
4

wouldn't it just convert `TEST` to `test` and leave `Test` as it is? – Muhammad Adeel Zahid Feb 27 '13 at 19:10
8

@MuhammadAdeelZahid - No, it's replacing "T" with "t", "E" with "e", etc. It's a 1-to-1 match. – Daniel Haley Apr 17 '13 at 19:24
2

It might be more clear to do `translate(., 'TES', 'tes')`. That way people will realize it's not a word translation, that it's a letter translation. – mlissner Jun 01 '17 at 23:51
1

or 'EST, 'est', though it does look cool (albeit a bit cryptic) that part of the searched term is appearing in the mapping (the repeated letters removed) – George Birbilis Sep 21 '20 at 20:48
1

we need `icontains()` :-) – But those new buttons though.. Sep 07 '21 at 18:35

score 11 · Answer 4 · answered Dec 12 '11 at 12:14

Yes. You can use translate to convert the text you want to match to lower case as follows:

/html/body//text()[contains(translate(., 
                                      'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
                                      'abcdefghijklmnopqrstuvwxyz'),
                   'test')]

score 7 · Answer 5 · edited Sep 30 '21 at 16:19

7

The way i always did this was by using the "translate" function in XPath. I won't say its very pretty but it works correctly.

/html/body//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz',
                                        'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),'TEST')]

hope this helps,

edited Sep 30 '21 at 16:19

Endre Both

5,540
1
26
31

answered Dec 12 '11 at 12:12

Marvin Smit

4,088
1
22
21

Michael Kay · Answer 6 · 2019-06-07T18:36:05.377

7

If you're using XPath 2.0 then you can specify a collation as the third argument to contains(). However, collation URIs are not standardized so the details depend on the product that you are using.

Note that the solutions given earlier using translate() all assume that you are only using the 26-letter English alphabet.

UPDATE: XPath 3.1 defines a standard collation URI for case-blind matching.

edited Jun 07 '19 at 18:36

answered Dec 12 '11 at 14:52

Michael Kay

156,231
11
92
164

Case insensitive XPath contains() possible?

6 Answers6

Modern XPath 2.0 (and higher) Solutions

Linked

Related