-1

Last week User @KunduK kindly helped me scrap a website to return the address of a particular record

Record in question : https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN

By Using the following snippet of Code;

address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)

However whilst trying to understand the snippet i started to see some additional data being returned.

On the screen shot below, the Left is the expected results to be returned, however on the right is what is being returned.

Inspecting the element i can see there is an additional row (highlighted in yellow)(that's not being presented on the UI (right hand side)

Company Address

I am also trying to get the "Website" and Reference Number" and following the example provided before, however following these steps (https://www.scrapingbee.com/blog/selenium-python/) i am not able to get the desired results being returned

Current Code:

Website=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".accordion_text h4"))).text
print(Website)

Website Inspect

website Inspect

Looking forward to your help!

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Masond3
  • 111
  • 6

1 Answers1

0

To extract the Website address and Firm reference number ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using Website address:

    driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[text()='Website']//following-sibling::a[1]"))).get_attribute("href"))
    
  • Using Firm reference number:

    driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[text()='Firm reference number']//following-sibling::p[1]"))).text)
    
  • Console Output:

    https://www.masonowen.com/
    311960
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


References

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • To extract the the _Name_ and _Address_ see the [answer](https://stackoverflow.com/a/75448167/7429447) to your previous [question](https://stackoverflow.com/q/75400104/7429447). – undetected Selenium Feb 14 '23 at 13:02