0

I want to find the image that best represents a web page.

My code is :

from bs4 import BeautifulSoup #Import stuff
import requests

r  = requests.get("http://www.test.com/") #Download website source

data = r.text  #Get the website source as text

soup = BeautifulSoup(data) #Setup a "soup" which BeautifulSoup can search

links = []

for link in soup.find_all('img'):  #Cycle through all 'img' tags
    imgSrc = link.get('src')   #Extract the 'src' from those tags
    links.append(imgSrc)    #Append the source to 'links'

print(links)  #Print 'links'

I know this three method might be useful:

Check for Open Graph/Twitter Card tags
Find the largest suitable image on the page
Look for a suitable video thumbnail if no image is found

I want to get biggest images in pages based on their dimensions.

I did a research on this but I couldn't find something good and fast executable.

  • Have you looked at [Beautiful Soup: get picture size from html](https://stackoverflow.com/questions/36754686/beautiful-soup-get-picture-size-from-html) – rassar Oct 30 '19 at 11:40
  • Dear @rassar, I have seen the page before it's just usable when the width and height attributes defined and unfortunately in most of times it's not defined. – William Johnson Oct 30 '19 at 11:45
  • Then, unfortunately you can't use BeautifulSoup for this because beautifulsoup can't read live DOM attributes. Have a look at selenium - https://stackoverflow.com/questions/15510882/selenium-get-coordinates-or-dimensions-of-element-with-python – rassar Oct 30 '19 at 11:46
  • Okay, but what about use other python libraries to check all the images for dimension? I think it's not good for speed of algorithm. – William Johnson Oct 30 '19 at 11:51
  • Is this possible to working on every img tags that's located into p tags as post images? – William Johnson Oct 30 '19 at 12:12
  • And also this link is might be usable https://stackoverflow.com/questions/51116907/beatifulsoup-how-to-get-image-size-by-url , but I want to find away to get the image dimensions without download all images using their css informations. – William Johnson Oct 30 '19 at 14:09

0 Answers0