Why does normalize-space(text()) not work with a preceding child element?

Question

No doubt that this is extremely basic, but it just won't "click" for me, despite the research that I've done so far. Given the following two HTML examples:

Example 1

<div _ngcontent-c35="" class="row facet-container ng-star-inserted">
    <div _ngcontent-c35="" class="searchresult-header">
        Locatie
    </div>
</div>

Example 2

<div _ngcontent-c42="" class="row facet-panel ng-star-inserted">
    <div _ngcontent-c42="" class="facet-panel-header brand-pointer" data-target="#ft5" data-toggle="collapse">
        <span _ngcontent-c42="" class="icon-plus ng-star-inserted" data-target="#ft5" data-toggle="collapse">
        </span> 
        Locatie
    </div>
    <div _ngcontent-c42="" class="collapse" id="ft5">
    </div>
</div>

Now I have the following piece of xpath:

//div[.//div[normalize-space(text())='Locatie']]

According to other questions and websites about xpath, text() selects text nodes directly descending from the node we're searching on. Therefore, in example #1, I expect to retrieve the first child "div" element. This happens correctly: no issues there.

I expect the same result in example #2. However, this is not the case: apparently the "span" element disrupts this specific search. When I manually remove it, I succesfully retrieve the required "div" element. Why is the search disrupted? The text should still be a direct child of the div element, no matter if the span element is there or not.

TLDR: Why does the "span" element prevent me from finding the second "div" element in example #2?

score 3 · Accepted Answer · edited Apr 16 '19 at 03:46

As Jason had answered this is because the signature of normalize-space() function, from the specs:

Function: string normalize-space(string?)

In XPath 1.0, whenever a string argument is needed, the language applies a type conversion by means of the string() function. From the specs:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

So, the resulting node-set from the text() node test is reduced to the first node in document order and then that node is converted to its string-value.

In this regards is when the always oversees whitespace only text nodes come to notice: your div element has two text nodes:

<div>
    <div>
        <!-- HERE ENDS THE FIRST --><span>
        </span> 
        Locatie
    <!-- HERE ENDS THE SECOND --></div>
    <div>
    </div>
</div>

Whenever you have mixed content markup, it's better to use the string-value rather than the text nodes. Otherwise you should use this expression:

//div[.//div/text()[normalize-space()='Locatie']]

Plus 1 for mentioning mixed content. – Daniel Haley Apr 16 '19 at 03:47 — Daniel Haley, Apr 16 '19 at 03:47

score 2 · Answer 2 · answered Apr 15 '19 at 18:29

I guess that's because normalize-space(text())='Locatie'] intend to check the first child text node (which is actually just an empty string) while you need to check the second one:

//div[.//div[normalize-space(text()[2])='Locatie']]

If you need generic XPath that will work for both cases try

//div[normalize-space(div)='Locatie']

score 0 · Answer 3 · answered Apr 15 '19 at 17:30

0

It may have something to do do with the white text/spaces (it's way over my pay grade...), because with this change of focus, the following expression seems to work with most (not all) xpath testers:

.//div[text()[contains(.,'Locat')]]

answered Apr 15 '19 at 17:30

Jack Fleeting

24,385
6
23
45

Why does normalize-space(text()) not work with a preceding child element?

3 Answers3