Get webpage contents with Python?

Question

I'm using Python 3.1, if that helps.

Anyways, I'm trying to get the contents of this webpage. I Googled for a little bit and tried different things, but they didn't work. I'm guessing that this should be an easy task, but...I can't get it. :/.

Results of urllib, urllib2:

>>> import urllib2
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import urllib2
ImportError: No module named urllib2
>>> import urllib
>>> urllib.urlopen("http://www.python.org")
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'
>>>

Python 3 solution

Thank you, Jason. :D.

import urllib.request
page = urllib.request.urlopen('http://services.runescape.com/m=hiscore/ranking?table=0&category_type=0&time_filter=0&date=1519066080774&user=zezima')
print(page.read())

Duplicate: Search for `urlib2` or `get web page [python]` in SO and you'll find 100's of similar questions. — S.Lott, Dec 03 '09 at 22:26
Tried urllib2 and urllib, but neither worked. (Edited first post) — Andrew, Dec 03 '09 at 22:32
He's using Python 3, so the APIs are different. I surely learned something new by researching this answer. — Jason R. Coombs, Dec 03 '09 at 22:39
@Andrew: It helps to check the questions and answers carefully to see if they say Python 3 or not. If they don't say Python 3, they don't apply to you. — S.Lott, Dec 03 '09 at 22:40
For anyone looking for python 2, see http://stackoverflow.com/q/2289768/79125 (use urllib.urlopen) — idbrii, Mar 14 '13 at 22:40

Jonathan Hartley · Answer 1 · 2021-11-17T17:47:22.183

If you're writing a project which installs packages from PyPI, then the best and most common library to do this is requests. It provides lots of convenient but powerful features. Use it like this:

import requests
response = requests.get('http://hiscore.runescape.com/index_lite.ws?player=zezima')
print (response.status_code)
print (response.content)

But if your project does not install its own dependencies, i.e. is limited to things built-in to the standard library, then you should consult one of the other answers.

score 35 · Accepted Answer · edited Dec 12 '17 at 00:16

35

Because you're using Python 3.1, you need to use the new Python 3.1 APIs.

Try:

urllib.request.urlopen('http://www.python.org/')

Alternately, it looks like you're working from Python 2 examples. Write it in Python 2, then use the 2to3 tool to convert it. On Windows, 2to3.py is in \python31\tools\scripts. Can someone else point out where to find 2to3.py on other platforms?

Edit

These days, I write Python 2 and 3 compatible code by using six.

from six.moves import urllib
urllib.request.urlopen('http://www.python.org')

Assuming you have six installed, that runs on both Python 2 and Python 3.

edited Dec 12 '17 at 00:16

kevinSpaceyIsKeyserSöze

3,693
2
16
25

answered Dec 03 '09 at 22:38

Jason R. Coombs

41,115
10
83
93

I'm on Windows. Anyways, thanks, it worked fine. (The page you linked me to looks very helpful, by the way. Thanks for that, especially.) – Andrew Dec 03 '09 at 22:42
1

On Ubuntu, it was in the path, so I just had to run the `2to3` command. Whereis says it is at `/usr/bin/2to3` – Azendale Dec 15 '12 at 18:29
2

Damn, python 3 is starting to become a problem: one can't just copy-paste the first stack overflow answer and expect it to work anymore ! – xApple Feb 01 '13 at 15:38
@xApple: The way I see it, Python 2 is starting to become a problem ;) – Jason R. Coombs Apr 01 '15 at 20:20
Using 'six' is a good idea if your code must work under both python 2 and python 3. This is only the case if you are writing a library to be used by others (and even then, caring about python2 is less and less common,) If you are writing executable scripts or applications, especially for your own use, you can just pick one of python3 or python2, and use it exclusively, free of the complications 'six' introduces. – Jonathan Hartley Nov 17 '21 at 17:50

score 9 · Answer 3 · edited Dec 12 '15 at 05:37

9

If you ask me. try this one

import urllib2
resp = urllib2.urlopen('http://hiscore.runescape.com/index_lite.ws?player=zezima')

and read the normal way ie

page = resp.read()

Good luck though

edited Dec 12 '15 at 05:37

Sumit

2,242
1
24
37

answered Nov 14 '13 at 09:02

Zuko

2,764
30
30

score 5 · Answer 4 · answered Dec 03 '09 at 22:56

5

Mechanize is a great package for "acting like a browser", if you want to handle cookie state, etc.

http://wwwsearch.sourceforge.net/mechanize/

answered Dec 03 '09 at 22:56

Joe Koberg

25,416
6
48
54

score 2 · Answer 5 · answered Dec 03 '09 at 22:29

2

You can use urlib2 and parse the HTML yourself.

Or try Beautiful Soup to do some of the parsing for you.

answered Dec 03 '09 at 22:29

JasDev

726
6
13

Tried urllib2 and urllib, but neither worked. (Edited first post) – Andrew Dec 03 '09 at 22:32
Andrew, others can help you better if you describe in detail what you tried and what error message(s) / unexpected behaviour resulted. – micahwittman Dec 03 '09 at 22:35
I edited it into my initial post because I didn't want a huge comment. :P. – Andrew Dec 03 '09 at 22:37

score 2 · Answer 6 · answered Sep 21 '19 at 19:54

2

Also you can use faster_than_requests package. That's very fast and simple:

import faster_than_requests as r
content = r.get2str("http://test.com/")

Look at this comparison:

answered Sep 21 '19 at 19:54

Chalist

3,160
5
39
68

score 0 · Answer 7 · answered Jul 18 '16 at 03:38

A solution with works with Python 2.X and Python 3.X:

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

url = 'http://hiscore.runescape.com/index_lite.ws?player=zezima'
response = urlopen(url)
data = str(response.read())

score 0 · Answer 8 · answered Sep 10 '18 at 18:18

0

Suppose you want to GET a webpage's content. The following code does it:

# -*- coding: utf-8 -*-
# python

# example of getting a web page

from urllib import urlopen
print urlopen("http://xahlee.info/python/python_index.html").read()

answered Sep 10 '18 at 18:18

Swathi Bhuvaneshwar Babu

1

Get webpage contents with Python?

Python 3 solution

8 Answers8

Linked