Xpth extract plain email text

Question

I'm trying to extract the email text from a list but without success. In particular I've used this code

//li/div/p//*[contains(., '@')]

but strangely it doesn't work! Could you help me? Here's the code exemple

<li class="bgmp_list-item">
            <h3 class="bgmp_list-placemark-title">
                <a href="http://www.exemple.com" class=""> Name1 </a>
            </h3>

            <div class="bgmp_list-description">
                <p class="">
                    <strong class="">Responsible:</strong> John Doe                      <br>
                    <strong class="">Site:</strong> <a title="www.exemple.com" href="http://www.exemple.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','www.2ld.it']);" target="_blank" class="">www.2ld.it</a>
                    <br>
                    <strong class="">Email:</strong> some_email@email.com                        
        <br><strong class="">Address:</strong> <a href="http://www.exemple.com" target="_blank" class="">3, Main Street 00000, London</a>                        <br>
                    <strong>Tel:</strong> 00 000000 <strong>Fax:</strong> 0000000                    
        </p>

            </div>

score 1 · Answer 1 · edited May 23 '17 at 12:16

1

You're almost there but not quite. For the sample code the correct xpath would be

//p/text()[contains(.,'@')]

Not to reinvent the wheel here is a very good explanation on it on another answer

edited May 23 '17 at 12:16

Community

1
1

answered May 04 '16 at 22:47

Rafael Almeida

5,142
2
20
33

Correct XPath (+1), but the linked answer doesn't provide relevant explanation on why OP's initial XPath doesn't work while this one work. Notice that the following form of XPath which OP uses `//*[contains(.,'ABC')]` doesn't suffer the same problem as `//*[contains(text(),'ABC')]` – har07 May 04 '16 at 23:13

score 0 · Answer 2 · answered May 04 '16 at 23:04

By using p//*[contains(., '@')] you apply the predicate on individual child elements of , while there is no such child element because the target email address text is direct child of . This is one of the reason why the intial XPath didn't work. Applying the predicate on  directly should work :

//li/div/p[contains(., '@')]

but that will return the  element. If you need to return only the text node that contains email address, then the predicate should be applied on individual text nodes within , as mentioned in the other answer :

//li/div/p/text()[contains(., '@')]

Xpth extract plain email text

2 Answers2