1

I am trying to extract text that are not nested within an HTML element. Here is the HTML

<div class="col-sm-12">
  <i class='fa fa-map-marker'></i>theCity
  <i class='fa fa-link'></i>theEmail
  <i class='fa fa-phone'></i>thePhone1
  <i class='fa fa-phone'></i>thePhone2
  <b>Fax:</b>theFax
  <b>Address:</b>theAddress
</div>

I wanted to get the following results

  • theCity
  • theEmail
  • thePhone1
  • thePhone2
  • theFax
  • theAddress

As you can see there have different formats. theCity, theEmail, thePhone1 and thePhone2 have similar formats while theFax and theAddress have another one. I tried getting both types of data using the following statements, but it didn't work.

Here is the code I tried for the fax and address

//b/following-sibling::text()[1]

Here is the code for the the city, email and phone data types

normalize-space(//div[@class="fa-map-marker"]/following-sibling::text())

What am I doing wrong?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
EngAbth9
  • 31
  • 9

1 Answers1

0

What am I doing wrong?

  1. For the i-based labels, realize that the fa fa-map-marker classes are on the i, not the div. Also, if you wish to use equality, you have to test against the entire attribute value. If you wish to use contains() for a more robust solution, see XPath to match @class value and element value? Finally, don't forget the [1] to ensure you only get the immediately following text node.
  2. For the b-based labels, specify the contents of the b element before using the following-sibling:: as you're already doing properly.

Here are the XPath expressions to select each of those targets:

  • theCity: //i[@class="fa fa-map-marker"]/following-sibling::text()[1]
  • theEmail: //i[@class="fa fa-link"]/following-sibling::text()[1]
  • thePhone1: //i[@class="fa fa-phone"][1]/following-sibling::text()[1]
  • thePhone2: //i[@class="fa fa-phone"][2]/following-sibling::text()[1]
  • theFax: //b[.="Fax:"]/following-sibling::text()[1]
  • theAddress: //b[.="Address:"]/following-sibling::text()[1]
kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Thanks, your answer works perfectly. But one thing i wanted to know is, what if i wanted to refer to the fa fa.... classes and the b based labels by also mentioning a parent div by id. If we assume that there is another div above the col-md-12 which nests all the above listed with id of 'theList'. How can we incorporate that into your answer. – EngAbth9 May 10 '20 at 16:03
  • would something like //div[@id="theList"]//i[@class="fa fa-map-marker"]/following-sibling::text()[1] work – EngAbth9 May 10 '20 at 16:09
  • Yes, specifying the heritage of the `i` or `b` elements in that manner is a fine way to narrow scope. – kjhughes May 10 '20 at 16:19