4

I'm trying to get the source code of a page by using:

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

and also by using a user_agent(headers) I did not succeed to get the source code of the page!

Have you guys any ideas what can be done? Thanks in Advance

user2546923
  • 41
  • 1
  • 1
  • 2

3 Answers3

11

I tried it and the requests works, but the content that you receive says that your browser must accept cookies (in french). You could probably get around that with urllib2, but I think the easiest way would be to use the requests lib (if you don't mind having an additional dependency).

To install requests:

pip install requests

And then in your script:

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

I'm pretty sure the source code of the page will be what you expect then.

Martin Maillard
  • 2,751
  • 19
  • 24
2

requests library worked for me as Martin Maillard showed.

Also in another thread I have noticed this note by leoluk here:

Edit: It's 2014 now, and most of the important libraries have been ported and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

So I wrote this get_page procedure:

import requests
def get_page (website_url):
    response = requests.get(website_url)
    return response.content

print get_page('http://example.com')

Cheers!

Community
  • 1
  • 1
Sergeus
  • 41
  • 3
0

I tried a lot of things, "urllib" "urllib2" and many other things, but one thing worked for me for everything I needed and solved any problem I faced. It was Mechanize .This library simulates using a real browser, so it handles a lot of issues in that area.

Ibrahim Awad
  • 498
  • 1
  • 6
  • 13