Is there a better way to retrieve webpage sizes with Python?

Question

I'd like a sanity check on this Python script. My goal is to input a list of urls and get a byte size, giving me an indicator if the url is good or bad.

import urllib2
import shutil

urls = (LIST OF URLS)

def getUrl(urls):
    for url in urls:
        file_name = url.replace('https://','').replace('.','_').replace('/','_')
        try:
            response = urllib2.urlopen(url)
        except urllib2.HTTPError, e:
            print e.code
        except urllib2URLError, e:
            print e.args
        print urls, len(response.read())
        with open(file_name,'wb') as out_file:
            shutil.copyfileobj(response, out_file)
getUrl(urls)

The problem I am having is my output looks like:

(LIST OF URLS) 22511
(LIST OF URLS) 56472
(LIST OF URLS) 8717
...

How would I make only one url appear with the byte size?
Is there a better way to get these results?

seconded. I'd have done it, if there wasn't some manual HTML in between. — Marcus Müller, Jun 02 '15 at 13:44
I think you meant `print url, ...` rather than `print urls, ...` — GP89, Jun 02 '15 at 13:45
Thank you all. I'm sleep deprived and new to stackoverflow. I was overlooking the 's'! — Jon Phillips, Jun 02 '15 at 18:02

score 2 · Accepted Answer · edited May 23 '17 at 12:23

2

Try

print url, len(response.read())

Instead of

print urls, len(response.read())

You are printing the list each time. Just print the current item.

There are some alternate ways to determine a pages size described here and here there is little point me duplicating that information here.

Edit

Perhaps you would consider using requests instead of urllib2.

You can easily extract only the content-length from the HEAD request and avoid a full GET. e.g.

import requests

h = requests.head('http://www.google.com')

print h.headers['content-length']

HEAD request using urllib2 or httplib2 detailed here.

edited May 23 '17 at 12:23

Community

1
1

answered Jun 02 '15 at 13:46

Paul Rooney

20,879
9
40
61

Thank you for the links. I'm reading up on requests and it does look like it would fit my needs better. Cheers! – Jon Phillips Jun 02 '15 at 18:10

score 2 · Answer 2 · answered Jun 02 '15 at 13:46

2

How would I make only one url appear with the byte size?

Obviously: don't

print urls, ...

but

print url, ...

answered Jun 02 '15 at 13:46

Marcus Müller

34,677
4
53
94

Is there a better way to retrieve webpage sizes with Python?

2 Answers2