Cannot locate elements by XPath on Python

Question

I was trying to get the websites of firms from Bloomberg using XPath. I was stuck because it always return an empty list. I did several tests and found I can't locate any elements on this webpage. This is the code I'm using.

import re 
import requests
from lxml import html

url = "https://www.bloomberg.com/profile/company/FWLT:US"
requests=requests.get(url)
tree = html.fromstring(requests.content)
website = tree.xpath('//*[@id="root"]/div/section/div[2]/section/div/section[7]/div/text()')
print(website)

I also tried selenium but ended up in the same problem. Can someone help me solve this?

score 0 · Answer 1 · answered Sep 15 '20 at 17:48

0

this will get you the website value -

website = tree.xpath('//h2[contains(text(), \'WEBSITE\')]/following-sibling::div')

be aware that I've escaped the single quotes from 'WEBSITE'

answered Sep 15 '20 at 17:48

Alin Stelian

861
1
6
16

Hi Alin, I tried yours but it still doesn't work. In fact, it returns empty list no matter what XPath I'm using... – Quint Z Sep 15 '20 at 19:00
that's because the website block unusual access - try to print "tree" - you will find this "We\'ve detected unusual activity from your computer network\n
To continue, please click the box below to let us know you\'re not a robot
– Alin Stelian Sep 15 '20 at 19:23

score 0 · Accepted Answer · answered Sep 15 '20 at 20:40

Using Selenium to print the text www.amecfw.com you can use either of the following Locator Strategies:

Using xpath, following and get_attribute():

print(driver.find_element_by_xpath("//h2[text()='WEBSITE']//following::div").get_attribute("innerHTML"))

Using xpath, following-sibling and text attribute:

print(driver.find_element_by_xpath("//h2[text()='WEBSITE']//following-sibling::div").text)

Ideally, to print the text www.amecfw.com you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using xpath, following and get_attribute():

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[text()='WEBSITE']//following::div"))).get_attribute("innerHTML"))

Using xpath, following-sibling and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[text()='WEBSITE']//following-sibling::div"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Outro

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Cannot locate elements by XPath on Python

2 Answers2

Outro