0

I'm trying to extract the email text from a list but without success. In particular I've used this code

//li/div/p//*[contains(., '@')]

but strangely it doesn't work! Could you help me? Here's the code exemple

<li class="bgmp_list-item">
            <h3 class="bgmp_list-placemark-title">
                <a href="http://www.exemple.com" class=""> Name1 </a>
            </h3>

            <div class="bgmp_list-description">
                <p class="">
                    <strong class="">Responsible:</strong> John Doe                      <br>
                    <strong class="">Site:</strong> <a title="www.exemple.com" href="http://www.exemple.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','www.2ld.it']);" target="_blank" class="">www.2ld.it</a>
                    <br>
                    <strong class="">Email:</strong> some_email@email.com                        
        <br><strong class="">Address:</strong> <a href="http://www.exemple.com" target="_blank" class="">3, Main Street 00000, London</a>                        <br>
                    <strong>Tel:</strong> 00 000000 <strong>Fax:</strong> 0000000                    
        </p>

            </div>

Andrea Angeli
  • 131
  • 1
  • 16

2 Answers2

1

You're almost there but not quite. For the sample code the correct xpath would be

//p/text()[contains(.,'@')]

Not to reinvent the wheel here is a very good explanation on it on another answer

Community
  • 1
  • 1
Rafael Almeida
  • 5,142
  • 2
  • 20
  • 33
  • Correct XPath (+1), but the linked answer doesn't provide relevant explanation on why OP's initial XPath doesn't work while this one work. Notice that the following form of XPath which OP uses `//*[contains(.,'ABC')]` doesn't suffer the same problem as `//*[contains(text(),'ABC')]` – har07 May 04 '16 at 23:13
0

By using p//*[contains(., '@')] you apply the predicate on individual child elements of <p>, while there is no such child element because the target email address text is direct child of <p>. This is one of the reason why the intial XPath didn't work. Applying the predicate on <p> directly should work :

//li/div/p[contains(., '@')]

but that will return the <p> element. If you need to return only the text node that contains email address, then the predicate should be applied on individual text nodes within <p>, as mentioned in the other answer :

//li/div/p/text()[contains(., '@')]
har07
  • 88,338
  • 12
  • 84
  • 137