How can I read the contents of an URL with Python?

Question

The following works when I paste it on the browser:

http://www.somesite.com/details.pl?urn=2344

But when I try reading the URL with Python nothing happens:

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)           
 myfile = f.readline()  
 print myfile

Do I need to encode the URL, or is there something I'm not seeing?

score 200 · Accepted Answer · edited Sep 24 '18 at 17:37

200

To answer your question:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

You need to read(), not readline()

EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen() was replaced by urllib.request.urlopen() (see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen for details).

If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://stackoverflow.com/a/28040508/158111 (Python 2/3 compat) https://stackoverflow.com/a/45886824/158111 (Python 3)

Or, just get this library here: http://docs.python-requests.org/en/latest/ and seriously use it :)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)

edited Sep 24 '18 at 17:37

Asclepius

57,944
17
167
143

answered Feb 28 '13 at 14:59

woozyking

4,880
1
23
29

@KiranSubbaraman it's a really good project, from APIs to the code structure – woozyking Feb 04 '15 at 20:02
I also recomends and encourage the programmer to use the new brand `requests` Module, its use yelds to a more Pythonic Code. – Hans Zimermann Jun 01 '17 at 23:41
3

I am getting the following error on python 3.5.2 :`Traceback (most recent call last): File "/home/lars/parser.py", line 9, in f = urllib.urlopen(link) AttributeError: module 'urllib' has no attribute 'urlopen'` Seems there is no urlopen function in python 3.5. Has it been renamed ? EDIT : Snippet in answer below solves : `from urllib.request import urlopen` – Luatic Jun 25 '18 at 13:44
@user7185318 yes in Python 3 the `urlib` package saw some refactoring and API changes. I'll update the answer to emphasize on Python 2. – woozyking Jun 25 '18 at 15:20
what if the provided link asks for username and password? How can then the code be changed? – Dr. Essen Sep 17 '19 at 12:12
Only the last option using "requests" works properly as others ignore the text encoding. – Shautieh Mar 09 '20 at 11:59
do not use this on while loop – greendino Oct 11 '20 at 08:17

score 45 · Answer 2 · edited Sep 24 '18 at 17:38

45

For python3 users, to save time, use the following code,

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print(myfile)

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

edited Sep 24 '18 at 17:38

Asclepius

57,944
17
167
143

answered Aug 25 '17 at 17:38

i.n.n.m

2,936
7
27
51

This is not the best way to read data from a url using python3 because it misses out on the benefits of the 'with' statement. See my answer: https://stackoverflow.com/a/56295038/908316 – Freddie Aug 13 '20 at 11:37
no this will not work on while loop. one call only. which is suck if you ask me – greendino Oct 11 '20 at 08:18

Freddie · Answer 3 · 2020-08-13T12:00:12.880

20

None of these answers are very good for Python 3 (tested on latest version at the time of this post).

This is how you do it...

import urllib.request

try:
   with urllib.request.urlopen('http://www.python.org/') as f:
      print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
   print(e.reason)

The above is for contents that return 'utf-8'. Remove .decode('utf-8') if you want python to "guess the appropriate encoding."

Documentation: https://docs.python.org/3/library/urllib.request.html#module-urllib.request

edited Aug 13 '20 at 12:00

answered May 24 '19 at 14:50

Freddie

908
1
12
24

Thanks, the original code was written for Python 2, but your contribution here has been noted. – Helen Neely May 25 '19 at 11:22

Martin Thoma · Answer 4 · 2015-05-03T12:38:46.077

12

A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

edited May 03 '15 at 12:38

answered Jan 20 '15 at 08:17

Martin Thoma

124,992
159
614
958

score 1 · Answer 5 · answered Mar 08 '18 at 09:21

1

We can read website html content as below :

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

answered Mar 08 '18 at 09:21

Akash Kinwad

704
2
7
22

2

This is the same as answer from @i.n.n.m. – PM0087 May 25 '18 at 11:51

score 1 · Answer 6 · answered Aug 24 '19 at 07:14

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
    url:
    data = url.read()

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data

score 0 · Answer 7 · edited May 16 '20 at 10:15

0

from urllib.request import urlopen

# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)

edited May 16 '20 at 10:15

codedge

4,754
2
22
38

answered May 16 '20 at 07:59

荷兰哲学家Elvira

19

Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. – codedge May 16 '20 at 10:14

Nirmal Sankalana · Answer 8 · 2023-03-02T15:34:31.413

0

import requests
from bs4 import BeautifulSoup

link = "https://www.timeshighereducation.com/hub/sinorbis"

res = requests.get(link)
if res.status_code == 200:
    soup = BeautifulSoup(res, 'html.parser')

# get the text content of the webpage
text = soup.get_text()

print(text)

using BeautifulSoup's HTML parser we can extract the content of the webpage.

edited Mar 02 '23 at 15:34

answered Feb 27 '23 at 09:47

Nirmal Sankalana

131
1
8

score -1 · Answer 9 · answered Aug 22 '17 at 11:00

-1

I used the following code:

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()

answered Aug 22 '17 at 11:00

ggglni

79
3

score -1 · Answer 10 · answered Nov 27 '19 at 07:37

# retrieving data from url
# only for python 3

import urllib.request

def main():
  url = "http://docs.python.org"

# retrieving data from URL
  webUrl = urllib.request.urlopen(url)
  print("Result code: " + str(webUrl.getcode()))

# print data from URL 
  print("Returned data: -----------------")
  data = webUrl.read().decode("utf-8")
  print(data)

if __name__ == "__main__":
  main()

score -2 · Answer 11 · answered Feb 28 '13 at 14:58

-2

The URL should be a string:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile

answered Feb 28 '13 at 14:58

ATOzTOA

34,814
22
96
117

12

Both ' and " are strings in Python – Leo Jul 25 '15 at 13:01

How can I read the contents of an URL with Python?

11 Answers11

Linked