1
import urllib.request,io
url = 'http://www.image.com/image.jpg'

path = io.BytesIO(urllib.request.urlopen(url).read())

I'd like to check the file size of the URL image in the filestream path before saving, how can i do this?

Also, I don't want to rely on Content-Length headers, I'd like to fetch it into a filestream, check the size and then save

wolfgang3
  • 163
  • 1
  • 2
  • 8
  • 1
    Possible duplicate: http://stackoverflow.com/questions/5909/get-size-of-a-file-before-downloading-in-python – Ihor Pomaranskyy Mar 30 '15 at 07:43
  • Why the need to not rely on Content-Length headers? You can check the size of a `BytesIO` object the same way you can with any open file object, using seeking to the end and `fobj.tell()`. But if you use the Content-Length headers you can *prevent having to read the whole image into memory first*. – Martijn Pieters Mar 30 '15 at 08:32

3 Answers3

2

Try importing urllib.request

import urllib.request, io
url = 'http://www.elsecarrailway.co.uk/images/Events/TeddyBear-3.jpg'
path = urllib.request.urlopen(url)
meta = path.info()

>>>meta.get(name="Content-Length")
'269898' # ie  269kb
itzMEonTV
  • 19,851
  • 4
  • 39
  • 49
2

You can get the size of the io.BytesIO() object the same way you can get it for any file object: by seeking to the end and asking for the file position:

path = io.BytesIO(urllib.request.urlopen(url).read())
path.seek(0, 2)  # 0 bytes from the end
size = path.tell()

However, you could just as easily have just taken the len() of the bytestring you just read, before inserting it into an in-memory file object:

data = urllib.request.urlopen(url).read()
size = len(data)
path = io.BytesIO(data)

Note that this means your image has already been loaded into memory. You cannot use this to prevent loading too large an image object. For that using the Content-Length header is the only option.

If the server uses a chunked transfer encoding to facilitate streaming (so no content length has been set up front), you can use a loop limit how much data is read.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

You could ask the server for the content-length information. Using urllib2 (which I hope is available in your python):

req = urllib2.urlopen(url)
meta = req,info()
length_text = meta.getparam("Content-Length")
try:
      length = int(length_text)
except:
      # length unknown, you may need to read
      length = -1
llogiq
  • 13,815
  • 8
  • 40
  • 72