25

How can I get dimensions of image without actually downloading it? Is it even possible? I have a list of urls of images and I want to assign width and size to it.

I know there is a way of doing it locally (How to check dimensions of all images in a directory using python?), but I don't want to download all the images.

Edit:

Following ed. suggestions, I edited the code. I came up with this code. Not sure weather it downloads whole file or just a part (as I wanted).

Community
  • 1
  • 1
grotos
  • 493
  • 2
  • 6
  • 11
  • 2
    it's usually some header in the beginning of the file, so you can download only few bytes. e.g. 6 bytes will be enough to get dimensions of jpeg: http://www.fastgraph.com/help/jpeg_header_format.html – max taldykin Sep 18 '11 at 08:10

11 Answers11

24

I found the solution on this site to work well:

import urllib
import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    if size: size = int(size)
    p = ImageFile.Parser()
    while 1:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None

print getsizes("http://www.pythonware.com/images/small-yoyo.gif")
# (10965, (179, 188))
jedierikb
  • 12,752
  • 22
  • 95
  • 166
  • 1
    Where does `ImageFile` comes from? – Gocht Feb 11 '16 at 20:54
  • 1
    PIL -- python imaging library – jedierikb Feb 11 '16 at 20:57
  • 5
    Take care with the file descriptor in this code: if the image size is retrieved the file is not closed. – Ivan De Paz Centeno Oct 03 '16 at 09:12
  • Can you elaborate @IvanDePazCenteno? Wouldn't a "file.close()" before the return size, p.image.size fix that? And also, is this even a problem? – Fabian Bosler Jul 16 '18 at 07:36
  • @FabianBosler yes, adding a `file.close()` before that line would do the trick, even though I would recommend using the `with` keyword to manage it, as it is a [good practice](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). Definitely yes, it is a problem. Not closing allocated resources will potentially become a disaster in certain contexts, for example in big loops. An allocated resource should always be closed, even if the OS or the interpreter itself can get rid of it. – Ivan De Paz Centeno Jul 16 '18 at 07:58
  • It is giving me following error. Have you used any other imports? "AttributeError: module 'urllib' has no attribute 'urlopen'" – Shagun Pruthi Dec 14 '18 at 08:19
  • @ShagunPruthi https://stackoverflow.com/questions/3969726/attributeerror-module-object-has-no-attribute-urlopen – jedierikb Dec 14 '18 at 14:06
  • @jedierikb : Gotcha! Thanks. – Shagun Pruthi Dec 17 '18 at 04:57
  • Thanks! Does `file.read(1024)` mean that it downloads only 1024 bytes instead of the whole image? – sound wave Jan 03 '22 at 12:28
17

This is just a Python 3+ adaptation of an earlier answer here.

from urllib import request as ulreq
from PIL import ImageFile
 
def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = ulreq.urlopen(uri)
    size = file.headers.get("content-length")
    if size: 
        size = int(size)
    p = ImageFile.Parser()
    while True:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return(size, None)
Xiao
  • 12,235
  • 2
  • 29
  • 36
Alex P. Miller
  • 2,128
  • 1
  • 23
  • 20
  • you should obviously use finally for file.close (or better context manager) and not bother anyone with using "break" after return. – Victor Gavro Jan 09 '23 at 22:21
16

This is based on ed's answer mixed with other things I found on the web. I ran into the same issue as grotos with .read(24). Download getimageinfo.py from here and download ReSeekFile.py from here.

import urllib2
imgdata = urllib2.urlopen(href)
image_type,width,height = getimageinfo.getImageInfo(imgdata)

Modify getimageinfo as such...

import ReseekFile

def getImageInfo(datastream):
    datastream = ReseekFile.ReseekFile(datastream)
    data = str(datastream.read(30))

#Skipping to jpeg

# handle JPEGs
elif (size >= 2) and data.startswith('\377\330'):
    content_type = 'image/jpeg'
    datastream.seek(0)
    datastream.read(2)
    b = datastream.read(1)
    try:
        while (b and ord(b) != 0xDA):
            while (ord(b) != 0xFF): b = datastream.read(1)
            while (ord(b) == 0xFF): b = datastream.read(1)
            if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                datastream.read(3)
                h, w = struct.unpack(">HH", datastream.read(4))
                break
            else:
                datastream.read(int(struct.unpack(">H", datastream.read(2))[0])-2)
            b = datastream.read(1)
        width = int(w)
        height = int(h)
    except struct.error:
        pass
    except ValueError:
        pass
ElJeffe
  • 637
  • 1
  • 8
  • 20
  • Nice work. I also ran into the same issue with the otherwise helpful response from ed – tohster May 05 '13 at 02:37
  • 2
    The source code for `getimageinfo.py` is not available anymore. Here is the code for anyone who is looking for it in future: https://gist.github.com/bmamouri/55ac6bfa7ba5eee03da2eb9e4f7469d9 – bman Jun 04 '16 at 19:18
10

If you're willing to download the first 24 bytes of each file, then this function (mentioned in johnteslade's answer to the question you mention) will work out the dimensions.

That's probably the least downloading necessary to do the job you want.

import urllib2
start = urllib2.urlopen(image_url).read(24)

Edit (1):

In the case of jpeg files it seems to need more bytes. You could edit the function so that instead of reading a StringIO.StringIO(data) it instead reads the file handle from urlopen. Then it will read exactly as much of the image as it needs to find out the width and height.

ed.
  • 1,373
  • 8
  • 10
  • Using this solution, especially .read(24), breaks that script. All works when I use read(). – grotos Sep 18 '11 at 08:56
  • It's basically the same as an example in the python docs (http://docs.python.org/library/urllib2.html). What error do you get using (24)? Just using read() (I guess you know) will download the whole file... – ed. Sep 18 '11 at 09:02
  • If I run with read(24) ther is some error in getImageInfo function: UnboundLocalError: local variable 'w' referenced before assignment – grotos Sep 18 '11 at 09:05
  • Hmm. Try running it with read(50) and see if it works. I think the error must be coming from the jpeg part of the function, so maybe it needs a few more bytes. – ed. Sep 18 '11 at 09:07
  • It is the same. I think it only works with read(X), where X is so large that it covers the whole file. – grotos Sep 18 '11 at 09:10
7

Since getimageinfo.py mentioned above doesn't work in Python3. Pillow is used instead of it.

Pillow can be found in pypi, or installed by using pip: pip install pillow.


from io import BytesIO
from PIL import Image
import requests
hrefs = ['https://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
RANGE = 5000
for href in hrefs:
    req  = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
    im = Image.open(BytesIO(req.content))

    print(im.size)
woohaha
  • 144
  • 2
  • 5
  • 2
    Doesn't this actually download the image? I believe that's what OP is trying to avoid – Ryan Pergent Mar 29 '17 at 16:01
  • 1
    I'd suggest to use requests' session to reuse TCP connection and use plain HTTP instead of HTTPS if possible. It may dramatically increase performance in some cases. – viator Nov 30 '18 at 08:46
5

By using the requests library:

To get the image size in bytes:

Only by getting the headers data from the website: (without downloading the image)

import requests

url = r"https://www.sulitest.org/files/source/Big%20image%20HD/elyx.png"

size = requests.get(url, stream = True).headers['Content-length']
print(size)
## output: 437495

## to see what other headers data you can get:
allheaders = requests.get(url, stream = True).headers
print(allheaders)

To get the image (Width, Height):

We have to download part of the image and let an image library read the image header and retrieve/parse the (Width, Height). here i'm using Pillow.

import requests
from PIL import ImageFile

resume_header = {'Range': 'bytes=0-2000000'}    ## the amount of bytes you will download
data = requests.get(url, stream = True, headers = resume_header).content

p = ImageFile.Parser()
p.feed(data)    ## feed the data to image parser to get photo info from data headers
if p.image:
    print(p.image.size) ## get the image size (Width, Height)
## output: (1400, 1536)
Baraa
  • 53
  • 1
  • 3
1

Unfortunately I can't comment, so this is as an answer:

Use a get query with the header

"Range": "bytes=0-30"

And then simply use

http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

If you use python's "requests", it's simply

r = requests.get(image_url, headers={
    "Range": "bytes=0-30"
})
image_info = get_image_info(r.content)

This fixes ed.'s answer and doesn't have any other dependencies (like ReSeekFile.py).

vincent
  • 1,953
  • 3
  • 18
  • 24
1

My fixed "getimageInfo.py", work with Python 3.4+, try it, just great!

import io
import struct
import urllib.request as urllib2

def getImageInfo(data):
    data = data
    size = len(data)
    #print(size)
    height = -1
    width = -1
    content_type = ''

    # handle GIFs
    if (size >= 10) and data[:6] in (b'GIF87a', b'GIF89a'):
        # Check to see if content_type is correct
        content_type = 'image/gif'
        w, h = struct.unpack(b"<HH", data[6:10])
        width = int(w)
        height = int(h)

    # See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
    # Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
    # and finally the 4-byte width, height
    elif ((size >= 24) and data.startswith(b'\211PNG\r\n\032\n')
          and (data[12:16] == b'IHDR')):
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[16:24])
        width = int(w)
        height = int(h)

    # Maybe this is for an older PNG version.
    elif (size >= 16) and data.startswith(b'\211PNG\r\n\032\n'):
        # Check to see if we have the right content type
        content_type = 'image/png'
        w, h = struct.unpack(b">LL", data[8:16])
        width = int(w)
        height = int(h)

    # handle JPEGs
    elif (size >= 2) and data.startswith(b'\377\330'):
        content_type = 'image/jpeg'
        jpeg = io.BytesIO(data)
        jpeg.read(2)
        b = jpeg.read(1)
        try:
            while (b and ord(b) != 0xDA):
                while (ord(b) != 0xFF): b = jpeg.read(1)
                while (ord(b) == 0xFF): b = jpeg.read(1)
                if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                    jpeg.read(3)
                    h, w = struct.unpack(b">HH", jpeg.read(4))
                    break
                else:
                    jpeg.read(int(struct.unpack(b">H", jpeg.read(2))[0])-2)
                b = jpeg.read(1)
            width = int(w)
            height = int(h)
        except struct.error:
            pass
        except ValueError:
            pass

    return content_type, width, height



#from PIL import Image
#import requests
#hrefs = ['http://farm4.staticflickr.com/3894/15008518202_b016d7d289_m.jpg','https://farm4.staticflickr.com/3920/15008465772_383e697089_m.jpg','https://farm4.staticflickr.com/3902/14985871946_86abb8c56f_m.jpg']
#RANGE = 5000
#for href in hrefs:
    #req  = requests.get(href,headers={'User-Agent':'Mozilla5.0(Google spider)','Range':'bytes=0-{}'.format(RANGE)})
    #im = getImageInfo(req.content)

    #print(im)
req = urllib2.Request("http://vn-sharing.net/forum/images/smilies/onion/ngai.gif", headers={"Range": "5000"})
r = urllib2.urlopen(req)
#f = open("D:\\Pictures\\1.jpg", "rb")
print(getImageInfo(r.read()))
# Output: >> ('image/gif', 50, 50)
#print(getImageInfo(f.read()))

Source code: http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

user3763937
  • 31
  • 1
  • 9
1

It's not possible to do it directly, but there's a workaround for that. If the files are present on the server, then implement the API endpoint that takes image name as an argument and returns the size.

But if the files are on the different server, you've got no other way but to download the files.

plaes
  • 31,788
  • 11
  • 91
  • 89
  • Based on the other answer for this question this seems to be an incorrect assertion. – Slater Victoroff Aug 26 '15 at 17:38
  • @SlaterTyranus No, all the other answers just suggest downloading the image (or parts of the image). This answer is the **most** correct, but the others are valid work-arounds. – GreySage Mar 03 '17 at 18:27
0

The shortest code I have come up with downloads only the first 1024 bytes. This can be set lower if you need it to but could give problems with some image types

from io import BytesIO
from urllib.request import urlopen
from PIL import Image
Image.MAX_IMAGE_PIXELS = None # My problem had really big images

def get_image_size_from_url(url):
    response = urlopen(url)
    r = response.read(1024)
    img = Image.open(BytesIO(r))
    return img.size
Tom Nijhof
  • 542
  • 4
  • 11
-1
import requests
from PIL import Image
from io import BytesIO

url = 'http://farm4.static.flickr.com/3488/4051378654_238ca94313.jpg'

img_data = requests.get(url).content    
im = Image.open(BytesIO(img_data))
print (im.size)
Yunhe
  • 665
  • 5
  • 10