Problem with findElement method webscraping with Selenium Python

Question

I am a building restoration student and I am learning to scrape. I am working on the collection of data from churches in Spain. For this I am working with the Catastro website. I'm collecting the data and I'm having trouble getting the src of the images.

Next, I put a part of the code that I have created throws me an error in the # Get the URL of the image part. When I access from the browser manually if I am able to find the image but I can't find the way to do it with Selenium. Could it be because the element is in a nested ::before?

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# Start a webdriver session using Firefox
driver = webdriver.Firefox()

# Go to the website
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")

# Wait until the map element is present and click on its center
map_element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="map"]'))
)
driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
map_element.click()

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

# Get the src attribute of the image element
img_src = img_element.get_attribute("src")

# Print the src of the image
print(img_src)

Shawn · Accepted Answer · 2023-02-09T18:20:43.063

There is a frame which you need to handle first, before you could execute the below code:

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

Solution: - Use the below code to switch to the frame and then perform the other actions

driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))

Full working code for your reference:

driver = webdriver.Chrome()
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='map']"))).click()
time.sleep(3)
driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))
img_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='ImgFachada0']")))
img_src = img_element.get_attribute("src")
print(img_src)

Console output:

https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=bf9e5588d83361af1bffe7521e86dd68ea6a3f0b

Process finished with exit code 0

Don't forget to switch back to the main page after your actions on the iframe:

#To switch back from iframe
driver.switch_to.default_content()

iframe in your HTML for your reference:

score 0 · Answer 2 · answered Feb 09 '23 at 23:47

The desired <img> element is within an <iframe>:

frame

Solution

To extract the value of the src attribute you have to:

Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired visibility of the element.

You can use either of the following locator strategies:

driver.get('https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-sm.btn-sec-inverted"))).click()
map_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='map']")))
driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
map_element.click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[contains(@src, 'OVCListaBienes')]")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@id='ImgFachada0']"))).get_attribute("src"))
driver.quit()

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console output:

https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=8a799d3f10ec7a9ec8f6937d450581bd75d2b750

Reference

You can find a couple of relevant discussions in:

Problem with findElement method webscraping with Selenium Python

2 Answers2

Solution

Reference