3

No doubt that this is extremely basic, but it just won't "click" for me, despite the research that I've done so far. Given the following two HTML examples:

Example 1

<div _ngcontent-c35="" class="row facet-container ng-star-inserted">
    <div _ngcontent-c35="" class="searchresult-header">
        Locatie
    </div>
</div>

Example 2

<div _ngcontent-c42="" class="row facet-panel ng-star-inserted">
    <div _ngcontent-c42="" class="facet-panel-header brand-pointer" data-target="#ft5" data-toggle="collapse">
        <span _ngcontent-c42="" class="icon-plus ng-star-inserted" data-target="#ft5" data-toggle="collapse">
        </span> 
        Locatie
    </div>
    <div _ngcontent-c42="" class="collapse" id="ft5">
    </div>
</div>

Now I have the following piece of xpath:

//div[.//div[normalize-space(text())='Locatie']]

According to other questions and websites about xpath, text() selects text nodes directly descending from the node we're searching on. Therefore, in example #1, I expect to retrieve the first child "div" element. This happens correctly: no issues there.

I expect the same result in example #2. However, this is not the case: apparently the "span" element disrupts this specific search. When I manually remove it, I succesfully retrieve the required "div" element. Why is the search disrupted? The text should still be a direct child of the div element, no matter if the span element is there or not.

TLDR: Why does the "span" element prevent me from finding the second "div" element in example #2?

Tybs
  • 512
  • 4
  • 23

3 Answers3

3

As Jason had answered this is because the signature of normalize-space() function, from the specs:

Function: string normalize-space(string?)

In XPath 1.0, whenever a string argument is needed, the language applies a type conversion by means of the string() function. From the specs:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

So, the resulting node-set from the text() node test is reduced to the first node in document order and then that node is converted to its string-value.

In this regards is when the always oversees whitespace only text nodes come to notice: your div element has two text nodes:

<div>
    <div>
        <!-- HERE ENDS THE FIRST --><span>
        </span> 
        Locatie
    <!-- HERE ENDS THE SECOND --></div>
    <div>
    </div>
</div>

Whenever you have mixed content markup, it's better to use the string-value rather than the text nodes. Otherwise you should use this expression:

//div[.//div/text()[normalize-space()='Locatie']]
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
Alejandro
  • 1,882
  • 6
  • 13
2

I guess that's because normalize-space(text())='Locatie'] intend to check the first child text node (which is actually just an empty string) while you need to check the second one:

//div[.//div[normalize-space(text()[2])='Locatie']]

If you need generic XPath that will work for both cases try

//div[normalize-space(div)='Locatie']
JaSON
  • 4,843
  • 2
  • 8
  • 15
0

It may have something to do do with the white text/spaces (it's way over my pay grade...), because with this change of focus, the following expression seems to work with most (not all) xpath testers:

.//div[text()[contains(.,'Locat')]]
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45