8

I can't figure out how to search for text containing single quotes using XPATHs.

For example, I've added a quote to the title of this question. The following line

$x("//*[text()='XQuery looking for text with 'single' quote']")

Returns an empty array.

However, if I try the following

$x("//*[text()=\"XQuery looking for text with 'single' quote\"]")

It does return the link for the title of the page, but I would like to be able to accept both single and double quotes in there, so I can't just tailor it for the single/double quote.

You can try it in chrome's or firebug's console on this page.

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
Ruan Mendes
  • 90,375
  • 31
  • 153
  • 217
  • You first expression should have worked in a valid XPath 1.0 parser. You do not specify whether you use 1.0 or 2.0... From the documentation: `To avoid a quotation mark in an expression being interpreted by the XML processor as terminating the attribute value the quotation mark can be entered as a character reference (" or ').` – Alexis Wilke Nov 28 '13 at 21:18

6 Answers6

13

Here's a hackaround (Thanks Dimitre Novatchev) that will allow me to search for any text in xpaths, whether it contains single or double quotes. Implemented in JS, but could be easily translated to other languages

function cleanStringForXpath(str)  {
    var parts = str.match(/[^'"]+|['"]/g);
    parts = parts.map(function(part){
        if (part === "'")  {
            return '"\'"'; // output "'"
        }

        if (part === '"') {
            return "'\"'"; // output '"'
        }
        return "'" + part + "'";
    });
    return "concat(" + parts.join(",") + ")";
}

If I'm looking for I'm reading "Harry Potter" I could do the following

var xpathString = cleanStringForXpath( "I'm reading \"Harry Potter\"" );
$x("//*[text()="+ xpathString +"]");
// The xpath created becomes 
// //*[text()=concat('I',"'",'m reading ','"','Harry Potter','"')]

Here's a (much shorter) Java version. It's exactly the same as JavaScript, if you remove type information. Thanks to https://stackoverflow.com/users/1850609/acdcjunior

String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";!
Community
  • 1
  • 1
Ruan Mendes
  • 90,375
  • 31
  • 153
  • 217
  • Juan, I am glad that my answer to your comments led to the solution. Please, consider accepting my answer. – Dimitre Novatchev Nov 20 '12 at 23:05
  • 1
    @DimitreNovatchev Your answer doesn't have the information I put in in my answer, the real trick is hidden in the comments. I based my answer off your answer, but your answer is not specifically answering my question. If you improve your to address my question specifically, I will accept it. – Ruan Mendes Nov 20 '12 at 23:08
  • This is the best one could do in XPath 1.0. If the expression was part of an XML document, it would be possible to use the two entities `"` and `'` -- but this isn't your case. I could provide a C# solution, but you seem to be using Javascript, in which I am not too-fluent. – Dimitre Novatchev Nov 20 '12 at 23:11
  • @DimitreNovatchev I'm just saying that for your answer to actually answer the question, it would have to at least specify that you need to wrap your entire string with `concat()` and then you can run a replace to change `'` and `"` into `"'"` and `'"'`. You wouldn't have to come up with code as I did. JS was just the quickest way to explain what it takes. I'm actually writing a Java version which is what I'm really going to use. Right now the answer looks like you misunderstood the question – Ruan Mendes Nov 20 '12 at 23:13
  • Juan Mendes, sure -- this is a not so complex C# work. Another way is to replace ' and " with two corresponding strings (preferrably single characters, so that the XPath `translate()` function can be used on one of the two ends) that are guaranteed not to occur in the text nodes. This would require two chained calls to `translate()` in the XPath expression and two calls to a `replace()` function on the programming-language end. – Dimitre Novatchev Nov 20 '12 at 23:19
  • 1
    This rationale is very useful! Helped a lot! I used this in Java: `String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";`! – acdcjunior Apr 16 '14 at 18:45
  • @ktxmatrix such an extended edit of an existing answer would better be a separate answer altogether. – lfurini Dec 21 '17 at 16:01
  • I think this snippet needs some extra work. I see two cases where it fails: 1) `str = ""`, 2) any `str` without quotes, since `concat` requires at least two arguments. – tokland Jun 04 '18 at 21:43
8

In XPath 2.0 and XQuery 1.0, the delimiter of a string literal can be included in the string literal by doubling it:

let $a := "He said ""I won't"""

or

let $a := 'He said "I can''t"'

The convention is borrowed from SQL.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • 1
    This answer is interesting. But I use Selenium with Firefox, and alas they seem to support XPath but not XPath 2. I say they seem, this is very scarcely documented. – Nicolas Barbulesco Nov 21 '13 at 16:24
6

This is an example:

/*/*[contains(., "'") and contains(., '"') ]/text()

When this XPath expression is applied on the following XML document:

<text>
    <t>I'm reading "Harry Potter"</t>
    <t>I am reading "Harry Potter"</t>
    <t>I am reading 'Harry Potter'</t>
</text>

the wanted, correct result (a single text node) is selected:

I'm reading "Harry Potter"

Here is verification using the XPath Visualizer (A free and open source tool I created 12 years ago, that has taught XPath the fun way to thousands of people):

enter image description here

Your problem may be that you are not able to specify this XPath expression as string in the programming language that you are using -- this isn't an XPath problem but a problem in your knowledge of your programming language.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 2
    Your answer tells you how to find text nodes that contain both a single and a double quote, by hardcoding single quotes inside double quotes (`"'"`) and double quotes inside single quotes(`'"'`). However, What I need is a query that would search for the specific text with a query like `//div[text()="I'm reading "Harry Potter""]`... obviously, my example is not properly escaping the quotes. I would expect `//*[text()='I'm reading "Harry Potter"']` to work – Ruan Mendes Nov 20 '12 at 21:43
  • 2
    No, the problem is not in "my knowledge of my programming language". The question is about how to escape quotes inside quoted content in XPath – Ruan Mendes Nov 20 '12 at 21:47
  • @JuanMendes: Use: `/*/*[.=concat("I'm reading ", '"Harry Potter"')]/text()` . In addition to this, in XPath 2.0 (and this means also in XQuery and XSLT 2.0) a quote is escaped simply by doubling it. – Dimitre Novatchev Nov 20 '12 at 21:48
  • 1
    To do that, it would require knowing which part of the string contains double or single quotes. I can't do that, the text to search for is beyond my control, it's handed to a method and I have to create an XPATH for it. Again, I cannot just use the reverse quoting, because that requires knowing the string to search for in advance – Ruan Mendes Nov 20 '12 at 21:50
  • @JuanMendes, Your method can find every occurence of a quote and apostrophe -- therefore it *can* generate the XPath expression that contains the `concat()` function. In case you are generating an XPath 2.0 expression, simply double every quote in the string -- this is a simple `replace()` function invocation. – Dimitre Novatchev Nov 20 '12 at 21:53
  • 1
    Double quoting does not work, not sure if XPath 2.0 is supported in browsers. The following does not yield any results: `$x("//*[text()=\"XQuery looking for text with ''single'' quote\"]")` – Ruan Mendes Nov 20 '12 at 21:54
  • 1
    XPath 2.0 is not supported in any browser. I was surprized by your use of the term XQuery together with Chrome and firebug. – Dimitre Novatchev Nov 20 '12 at 21:56
  • If I can't find another way, I will break up the string into all the necessary parts and combine them with the required `concat("'")` and `concat('"')` that you suggested. I'm using Selenium, by the way. I use `$x()` to test it without having to run Selenium – Ruan Mendes Nov 20 '12 at 21:57
  • @JuanMendes, Yes, this can be done in C# -- not entirely trivial, but possible. I did something similar many years ago using C++. – Dimitre Novatchev Nov 20 '12 at 22:01
  • +1 Though it's not exactly what I was looking for, there's something I can fall back to in the comments – Ruan Mendes Nov 20 '12 at 22:10
  • This answer does not answer the question. I want to write a `'` in `'this string'`. @Juan, I use Selenium too. – Nicolas Barbulesco Nov 21 '13 at 16:10
  • @NicolasBarbulesco, I would recommend that you ask a separate question. It isn't clear from your comment what exactly you need to find and in what. As for whether this answer answers the question, please read the final solution by the submitter of the question, where he says: "Here's a hackaround (Thanks Dimitre Novatchev)". – Dimitre Novatchev Nov 22 '13 at 05:12
1

Additionally, if you were using XQuery, instead of XPath, as the title says, you could also use the xml entities:

   "&quot; for double and &apos; for single quotes"

they also work within single quotes

BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • 1
    I'm not sure what you mean by using XQuery instead of XPath, can you expand on that? I'm writing automation tests using Selenium – Ruan Mendes Nov 26 '12 at 18:38
  • Well, you mentioned XQuery in the title. I don't know if Selenium supports XQuery. Anyways, the strings there supports basic xml entities, while those of XPath do not. (compare the [XQuery](http://www.w3.org/TR/2010/REC-xquery-20101214/#id-literals) and [XPath](http://www.w3.org/TR/xpath20/#id-literals) standards) – BeniBela Nov 26 '12 at 21:49
1

You can do this using a regular expression. For example (as ES6 code):

export function escapeXPathString(str: string): string {
    str = str.replace(/'/g, `', "'", '`);

    return `concat('${str}', '')`;
}

This replaces all ' in the input string by ', "'", '.

The final , '' is important because concat('string') is an error.

Danielle Madeley
  • 2,616
  • 1
  • 19
  • 26
0

Well I was in the same quest, and after a moment I found that's there is no support in xpath for this, quiet disappointing! But well we can always work around it!

I wanted something simple and straight froward. What I come with is to set your own replacement for the apostrophe, kind of unique code (something you will not encounter in your xml text) , I chose //apos// for example. now you put that in both your xml text and your xpath query . (in case of xml you didn't write always we can replace with replace function of any editor). And now how we do? we search normally with this, retrieve the result, and replace back the //apos// to '.

Bellow some samples from what I was doing: (replace_special_char_xpath() is what you need to make)

function repalce_special_char_xpath($str){
    $str = str_replace("//apos//","'",$str);
    /*add all replacement here */
    return $str;
}

function xml_lang($xml_file,$category,$word,$language){ //path can be relative or absolute
    $language = str_replace("-","_",$language);// to replace - with _ to be able to use "en-us", .....
    $xml = simplexml_load_file($xml_file);
    $xpath_result = $xml->xpath("${category}/def[en_us = '${word}']/${language}");
    $result = $xpath_result[0][0];
    return repalce_special_char_xpath($result);
}

the text in xml file:

<def>
     <en_us>If you don//apos//t know which server, Click here for automatic connection</en_us>   <fr_fr>Si vous ne savez pas quelle serveur, Cliquez ici pour une connexion automatique</fr_fr>    <ar_sa>إذا لا تعرفوا أي سرفير, إضغطوا هنا من أجل إتصال تلقائي</ar_sa>
</def>

and the call in the php file (generated html):

<span><?php echo xml_lang_body("If you don//apos//t know which server, Click here for automatic connection")?>
Mohamed Allal
  • 17,920
  • 5
  • 94
  • 97