1

I was trying to get the websites of firms from Bloomberg using XPath. I was stuck because it always return an empty list. I did several tests and found I can't locate any elements on this webpage. This is the code I'm using.

import re 
import requests
from lxml import html

url = "https://www.bloomberg.com/profile/company/FWLT:US"
requests=requests.get(url)
tree = html.fromstring(requests.content)
website = tree.xpath('//*[@id="root"]/div/section/div[2]/section/div/section[7]/div/text()')
print(website)

I also tried selenium but ended up in the same problem. Can someone help me solve this?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Quint Z
  • 15
  • 3

2 Answers2

0

this will get you the website value -

website = tree.xpath('//h2[contains(text(), \'WEBSITE\')]/following-sibling::div')

be aware that I've escaped the single quotes from 'WEBSITE'

Alin Stelian
  • 861
  • 1
  • 6
  • 16
  • Hi Alin, I tried yours but it still doesn't work. In fact, it returns empty list no matter what XPath I'm using... – Quint Z Sep 15 '20 at 19:00
  • that's because the website block unusual access - try to print "tree" - you will find this "We\'ve detected unusual activity from your computer network\n

    To continue, please click the box below to let us know you\'re not a robot

    – Alin Stelian Sep 15 '20 at 19:23
0

Using Selenium to print the text www.amecfw.com you can use either of the following Locator Strategies:

  • Using xpath, following and get_attribute():

    print(driver.find_element_by_xpath("//h2[text()='WEBSITE']//following::div").get_attribute("innerHTML"))
    
  • Using xpath, following-sibling and text attribute:

    print(driver.find_element_by_xpath("//h2[text()='WEBSITE']//following-sibling::div").text)
    

Ideally, to print the text www.amecfw.com you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using xpath, following and get_attribute():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[text()='WEBSITE']//following::div"))).get_attribute("innerHTML"))
    
  • Using xpath, following-sibling and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[text()='WEBSITE']//following-sibling::div"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352