I am using BeautifulSoup to extract image URLs from an HTML structure in Python. The HTML structure contains several <img>
tags with the src
attribute. I've implemented the _get_images
function, which uses BeautifulSoup's find_all("img")
method to retrieve the image URLs. However, I'm facing an issue where some image URLs are returning as None
even though the src
attribute is present in the HTML.
Here's my _get_images
function:
def _get_images(self, soup):
article_images = []
images = soup.find_all("img")
for img in images:
src = img.get('src')
article_images.append(src)
return article_images
The output I get shows that some URLs are None
, while others are correctly retrieved. I have checked the HTML structure, and the <img>
tags do contain the src
attribute. What could be causing this problem, and how can I resolve it to fetch all the image URLs correctly?
What could be causing this problem, and how can I resolve it to fetch all the image URLs and titles correctly? My goal is to have a list of URLs, where each URL contains the src
the image, and to ensure that no None values
are present in the list.