I have an XPath expression that works perfect in google chrome's XPath Helper tool. Using this web page: enter link description here
and paste this in the xpath tool:
//dd[@class='open-hours']//div//span/following-sibling::text()
and you will get a large paragraph staring with: "For any three-day period..."
The screenshot below shows our starting element and the text we're trying to get:
Using lxml.html I start with this expression "//dd[@class='open-hours']//div" and then loop through the div tags getting info, but in this case it's returns data from elsewhere, an address: Washington, DC 20001
Can someone please explain why using what seems to be the same xpath in code is not outputting as expected?
import requests, time, socket
import lxml.html as lxml
response = requests.get('https://www.yellowpages.com/washington-dc/mip/bnsf-railway-496598824')
data = response.text
tree = lxml.fromstring(data)
stuff = tree.xpath("//dd[@class='open-hours']//div")
for elem in stuff:
try:
for tbl in elem:
if tbl.tag == 'span':
header = tbl.text
blob = ''
if (len(elem.xpath("//span/following-sibling::text()"))) == 1:
blob = elem.xpath("//span/following-sibling::text()")[0].strip()
print(blob)
except Exception as detail:
print(str(detail))
print(str(elem.text_content().strip()))
continue