Unable to get page source code in python

Question

I'm trying to get the source code of a page by using:

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

and also by using a user_agent(headers) I did not succeed to get the source code of the page!

Have you guys any ideas what can be done? Thanks in Advance

What your getting is not the complete source code! try to open the page you will see the difference — user2546923, Jul 03 '13 at 15:08

Martin Maillard · Answer 1 · 2016-10-29T08:18:22.277

I tried it and the requests works, but the content that you receive says that your browser must accept cookies (in french). You could probably get around that with urllib2, but I think the easiest way would be to use the requests lib (if you don't mind having an additional dependency).

To install requests:

pip install requests

And then in your script:

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

I'm pretty sure the source code of the page will be what you expect then.

score 2 · Answer 2 · edited May 23 '17 at 12:00

requests library worked for me as Martin Maillard showed.

Also in another thread I have noticed this note by leoluk here:

Edit: It's 2014 now, and most of the important libraries have been ported and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

So I wrote this get_page procedure:

import requests
def get_page (website_url):
    response = requests.get(website_url)
    return response.content

print get_page('http://example.com')

Cheers!

score 0 · Answer 3 · answered Jul 03 '13 at 17:00

I tried a lot of things, "urllib" "urllib2" and many other things, but one thing worked for me for everything I needed and solved any problem I faced. It was Mechanize .This library simulates using a real browser, so it handles a lot of issues in that area.

Unable to get page source code in python

3 Answers3