urllib2 not returning full webpage

Question

I'm just starting out in Python and I'm trying to request the html source code of a site using urllib2. However when I try and get the html content from a site I'm not getting the full html content - there are tags missing. I know they're missing as when I view the site in firebug the code shows up. Is this due to the way I'm requesting the data - or due to the site? If so is there a way in which I can get the full source code of the site in python, and then parse it?

Currently the code I'm using to request the content and the site I'm trying is:

import urllib2

url = 'http://marinetraffic.com/ais/'
response = urllib2.urlopen(url)
html = response.read()
print(html)

Specifically the content between the - div id="map_area" - is missing. Any help/pointers greatly appreciated!

This [related question](http://stackoverflow.com/q/8323728/183066) will be helpful. — jcollado, Mar 01 '12 at 13:50

score 4 · Accepted Answer · answered Mar 01 '12 at 13:23

4

You are getting incomplete data because most of the content on this page is dynamically generated via Javascript...

answered Mar 01 '12 at 13:23

plaes

31,788
11
91
89

score 0 · Answer 2 · answered Mar 01 '12 at 14:37

read on a descriptor returned by urlopen will only return what has already been downloaded. So you're liable to get a short read. You're better off using urllib.urlretrieve(), which tries to fetch the entire file, checks the Content-Length header, and raises an error if it fails.

urllib2 not returning full webpage

2 Answers2

Linked