12

I need to locate the node within an xml file by its value using XPath. The problem araises when the node to find contains value with whitespaces inside. F.e.:

<Root>
  <Child>value</Child>
  <Child>value with spaces</Child>
</Root>

I can not construct the XPath locating the second Child node.

Simple XPath /Root/Child perfectly works for both children, but /Root[Child=value with spaces] returns an empty collection.

I have already tried masking spaces with %20, & #20;, & nbsp; and using quotes and double quotes.

Still no luck.

Does anybody have an idea?

zerkms
  • 249,484
  • 69
  • 436
  • 539
user15108
  • 275
  • 1
  • 4
  • 8

7 Answers7

21

Depending on your exact situation, there are different XPath expressions that will select the node, whose value contains some whitespace.

First, let us recall that any one of these characters is "whitespace":

    &#x09; -- the Tab

    &#xA; -- newline

    &#xD; -- carriage return

    ' ' or &#x20; -- the space

If you know the exact value of the node, say it is "Hello World" with a space, then a most direct XPath expression:

     /top/aChild[. = 'Hello World']

will select this node.

The difficulties with specifying a value that contains whitespace, however, come from the fact that we see all whitespace characters just as ... well, whitespace and don't know if a it is a group of spaces or a single tab.

In XPath 2.0 one may use regular expressions and they provide a simple and convenient solution. Thus we can use an XPath 2.0 expression as the one below:

    /*/aChild[matches(., "Hello\sWorld")]

to select any child of the top node, whose value is the string "Hello" followed by whitespace followed by the string "World". Note the use of the matches() function and of the "\s" pattern that matches whitespace.

In XPath 1.0 a convenient test if a given string contains any whitespace characters is:

not(string-length(.)= stringlength(translate(., ' &#9;&#xA;&#xD;','')))

Here we use the translate() function to eliminate any of the four whitespace characters, and compare the length of the resulting string to that of the original string.

So, if in a text editor a node's value is displayed as

"Hello    World",

we can safely select this node with the XPath expression:

/*/aChild[translate(., ' &#9;&#xA;&#xD;','') = 'HelloWorld']

In many cases we can also use the XPath function normalize-space(), which from its string argument produces another string in which the groups of leading and trailing whitespace is cut, and every whitespace within the string is replaced by a single space.

In the above case, we will simply use the following XPath expression:

/*/aChild[normalize-space() = 'Hello World']

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • @DimitreNovatchev could you give one docs where such coding list can be found such as * -- the Tab*. – Arup Rakshit Aug 29 '13 at 15:09
  • @Babai, The whitespace characters are just these three: – Dimitre Novatchev Aug 29 '13 at 15:11
  • @DimitreNovatchev I am talking about any more for others .. Ok. Thanks for your prompt reply.. There is a very good [**`question`**](http://stackoverflow.com/questions/18513128/xpath-select-all-but-not-selfstrong-and-selfstrong-following-siblingtext). Can you please help OP.. I am trying but couldn't formulate.. Do you have time Sir ? – Arup Rakshit Aug 29 '13 at 15:14
  • 1
    @Babai, Unfortunately I am quite busy these days. You need to look at the Unicode Standard (http://www.unicode.org/) – Dimitre Novatchev Aug 30 '13 at 04:59
  • @DimitreNovatchev I tried this `concat(string(//textarea[@name="command"]/text()),' ')` but that doesn't happen, I got instead `1 `.. how to add a newline to a string using xpath ? – Arup Rakshit Aug 30 '13 at 13:05
  • @Babai, You are adding a NL to the string value of the first node taken from the first argument of concat() -- are you sure this is really what you want? – Dimitre Novatchev Aug 30 '13 at 14:10
  • @DimitreNovatchev Actually I want to print the text from the respective nodes line-by-line.. This is the need.. – Arup Rakshit Aug 30 '13 at 14:11
  • is it possible to use normalize-space with contains? something like aChild[contains(normalize-space(),'Hello World')] – AbtPst Dec 02 '15 at 16:23
  • @AbtPst, Yes, this is perfectly legal and could have its own use cases. – Dimitre Novatchev Dec 02 '15 at 17:12
  • but it does not seem to have any effect for me! does normalize-space() also deal with newlines? – AbtPst Dec 02 '15 at 17:25
  • @AbtPst, You may ask a question -- this is better than having comment dialogues. – Dimitre Novatchev Dec 02 '15 at 19:36
10

Try either this:

/Root/Child[normalize-space(text())=value without spaces]

or

/Root/Child[contains(text(),value without spaces)]

or (since it looks like your test value may be the issue)

/Root/Child[normalize-space(text())=normalize-space(value with spaces)]

Haven't actually executed any of these so the syntax may be wonky.

zpea
  • 1,072
  • 6
  • 23
kdgregory
  • 38,754
  • 10
  • 77
  • 102
  • Dimitre Novatchev's answer is thorough but I submit it is overkill for the majority of situations. I am upvoting this answer because it is simpler and will work for most situations. Well, at least in spirit:-)--the function is normalize-space() rather than normalize(). See: official documentation on the [normalize-space](http://www.w3.org/TR/xpath/#function-normalize-space) function. – Michael Sorens Jan 10 '11 at 19:54
2

Locating the Attribute by value containing whitespaces using XPath

I have a input type element with value containing white space.

eg:

<input type="button"  value="Import&nbsp;Selected&nbsp;File">

I solved this by using this xpath expression.

//input[contains(@value,'Import') and contains(@value ,'Selected')and contains(@value ,'File')]

Hope this will help you guys.

gihan-maduranga
  • 4,381
  • 5
  • 41
  • 74
  • Well, it will also match 'File Selected Import', which sometimes could be a trap and certainly was not desirable in my case. – user15108 Jan 15 '15 at 13:46
1

"x0020" worked for me on a jackrabbit based CQ5/AEM repository in which the property names had spaces. Below would work for a property "Record ID"-

[(jcr:contains(jcr:content/@Record_x0020_ID, 'test'))]
Manish Paul
  • 171
  • 1
  • 5
0

All of the above solutions didn't really work for me. However, there's a much simpler solution.

When you create the XMLDocument, make sure you set PreserveWhiteSpace property to true;

        XmlDocument xmldoc = new XmlDocument();
        xmldoc.PreserveWhitespace = true;
        xmldoc.Load(xmlCollection);
Francis
  • 1,798
  • 1
  • 26
  • 32
0

did you try #x20 ?

Scott Evernden
  • 39,136
  • 15
  • 78
  • 84
0

i've googled this up like on the second link:

try to replace the space using "x0020"

this seems to work for the guy.

melaos
  • 8,386
  • 4
  • 56
  • 93