2

I am using BeautifulSoup to extract image URLs from an HTML structure in Python. The HTML structure contains several <img> tags with the src attribute. I've implemented the _get_images function, which uses BeautifulSoup's find_all("img") method to retrieve the image URLs. However, I'm facing an issue where some image URLs are returning as None even though the src attribute is present in the HTML.

Here's my _get_images function:

def _get_images(self, soup):
    article_images = []
    images = soup.find_all("img")

    for img in images:
        src = img.get('src')
        article_images.append(src)

    return article_images

The output I get shows that some URLs are None, while others are correctly retrieved. I have checked the HTML structure, and the <img> tags do contain the src attribute. What could be causing this problem, and how can I resolve it to fetch all the image URLs correctly?

What could be causing this problem, and how can I resolve it to fetch all the image URLs and titles correctly? My goal is to have a list of URLs, where each URL contains the src the image, and to ensure that no None values are present in the list.

Elie Hacen
  • 372
  • 12

1 Answers1

2

Possibly the img elements are dynamic elements.


Solution

To extract the values of src attribute from the <img> elements you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

Code block:

def _get_images(self):
    article_images = [my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.TAG_NAME, "img")))]
    return article_images

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 2
    Thank you so much for your help! Your solution worked perfectly, and I was able to successfully extract all the image URLs with `src` attribute. I really appreciate your expertise and assistance in resolving the issue. Thanks again for taking the time to help me out! – Elie Hacen Jul 29 '23 at 18:38