0

I am a building restoration student and I am learning to scrape. I am working on the collection of data from churches in Spain. For this I am working with the Catastro website. I'm collecting the data and I'm having trouble getting the src of the images.

Next, I put a part of the code that I have created throws me an error in the # Get the URL of the image part. When I access from the browser manually if I am able to find the image but I can't find the way to do it with Selenium. Could it be because the element is in a nested ::before?

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# Start a webdriver session using Firefox
driver = webdriver.Firefox()

# Go to the website
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")

# Wait until the map element is present and click on its center
map_element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="map"]'))
)
driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
map_element.click()

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

# Get the src attribute of the image element
img_src = img_element.get_attribute("src")

# Print the src of the image
print(img_src)
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Bazofia
  • 3
  • 1

2 Answers2

0

There is a frame which you need to handle first, before you could execute the below code:

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

Solution: - Use the below code to switch to the frame and then perform the other actions

driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))

Full working code for your reference:

driver = webdriver.Chrome()
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='map']"))).click()
time.sleep(3)
driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))
img_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='ImgFachada0']")))
img_src = img_element.get_attribute("src")
print(img_src)

Console output:

https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=bf9e5588d83361af1bffe7521e86dd68ea6a3f0b

Process finished with exit code 0

Don't forget to switch back to the main page after your actions on the iframe:

#To switch back from iframe
driver.switch_to.default_content()

iframe in your HTML for your reference: enter image description here

Shawn
  • 4,064
  • 2
  • 11
  • 23
0

The desired <img> element is within an <iframe>:

frame


Solution

To extract the value of the src attribute you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the desired visibility of the element.

  • You can use either of the following locator strategies:

    driver.get('https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-sm.btn-sec-inverted"))).click()
    map_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='map']")))
    driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
    map_element.click()
    WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[contains(@src, 'OVCListaBienes')]")))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@id='ImgFachada0']"))).get_attribute("src"))
    driver.quit()
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console output:

    https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=8a799d3f10ec7a9ec8f6937d450581bd75d2b750
    

Reference

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352