5

I'm just starting out in Python and I'm trying to request the html source code of a site using urllib2. However when I try and get the html content from a site I'm not getting the full html content - there are tags missing. I know they're missing as when I view the site in firebug the code shows up. Is this due to the way I'm requesting the data - or due to the site? If so is there a way in which I can get the full source code of the site in python, and then parse it?

Currently the code I'm using to request the content and the site I'm trying is:

import urllib2

url = 'http://marinetraffic.com/ais/'
response = urllib2.urlopen(url)
html = response.read()
print(html)

Specifically the content between the - div id="map_area" - is missing. Any help/pointers greatly appreciated!

Joe
  • 63
  • 6

2 Answers2

4

You are getting incomplete data because most of the content on this page is dynamically generated via Javascript...

plaes
  • 31,788
  • 11
  • 91
  • 89
0

read on a descriptor returned by urlopen will only return what has already been downloaded. So you're liable to get a short read. You're better off using urllib.urlretrieve(), which tries to fetch the entire file, checks the Content-Length header, and raises an error if it fails.

alexis
  • 48,685
  • 16
  • 101
  • 161