I have the following script that prints the src path and sizes of all images on a specified url:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
from PIL import Image
import requests
url="https://example.com/"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
images = soup.find_all('img')
for img in images:
if img.has_attr('src') :
imgsize = Image.open(requests.get(img['src'], stream=True).raw)
print(img['src'], imgsize.size)
It works fine for some url's but for others i get the following error:
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x10782e900>
Is there a way to overcome this error?