If you are just trying to get the content length of a file by URL, you can do so by downloading only the HTTP headers and checking the Content-Length
field:
import requests
url='https://commons.wikimedia.org/wiki/File:Leptocorisa_chinensis_(20566589316).jpg'
http_response = requests.get(url)
print(f"Size of image {url} = {http_response.headers['Content-Length']} bytes")
However, if the image is compressed by the server before sending, the Content-Length
field will contain the compressed file size (the amount of data that will actually be downloaded) rather than the uncompressed image size.
To do this for all of the images on a given page, you could use the BeautifulSoup HTML processing library to extract a list of URLs for all of the images on the page and check the file size as follows:
from time import sleep
import requests
from bs4 import BeautifulSoup as Soup
url='https://en.wikipedia.org/wiki/Agent_Orange'
html = Soup(requests.get(url).text)
image_links = [(url + a['href']) for a in html.find_all('a', {'class': 'image'})]
for img_url in image_links:
response = requests.get(img_url)
try:
print(f"Size of image {img_url} = {response.headers['Content-Length']} bytes")
except KeyError:
print(f"Server didn't specify content length in headers for {img_url}")
sleep(0.5)
You'll have to adjust this to your specific problem, and might have to pass other parameters to soup.find_all()
to narrow it down to the specific images you're interested in, but something similar to this will achieve what you're trying to do.