31

How can I get python to get the contents of an HTTP page? So far all I have is the request and I have imported http.client.

agf
  • 171,228
  • 44
  • 289
  • 238
BiscottiGummyBears
  • 365
  • 2
  • 4
  • 5

6 Answers6

56

Using urllib.request is probably the easiest way to do this:

import urllib.request
f = urllib.request.urlopen("http://stackoverflow.com")
print(f.read())
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
14

Usage built-in module "http.client"

import http.client

connection = http.client.HTTPSConnection("api.bitbucket.org", timeout=2)
connection.request('GET', '/2.0/repositories')
response = connection.getresponse()
print('{} {} - a response on a GET request by using "http.client"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "http.client" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage third-party library "requests"

response = requests.get("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "requests"'.format(response.status_code, response.reason))
content = response.content.decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "requests" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage built-in module "urllib.request"

response = urllib.request.urlopen("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "urllib.request"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "urllib.request" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Notes:

  1. Python 3.4
  2. Result from the responses most likely will be differ only content
Quentin
  • 31
  • 7
PADYMKO
  • 4,217
  • 2
  • 36
  • 41
2

You can also use the requests library. I found this particularly useful because it was easier to retrieve and display the HTTP header.

import requests

source = 'http://www.pythonlearn.com/code/intro-short.txt'

r = requests.get(source)

print('Display actual page\n')
for line in r:
    print (line.strip())

print('\nDisplay all headers\n')
print(r.headers)
dimsum88
  • 47
  • 4
1

pip install requests

import requests

r = requests.get('https://api.spotify.com/v1/search?type=artist&q=beyonce')
r.json()
Anthony Awuley
  • 3,455
  • 30
  • 20
0

Add this code which can format data for human reading:

text = f.read().decode('utf-8')
kenorb
  • 155,785
  • 88
  • 678
  • 743
SKGoC
  • 21
  • 2
0

https://stackoverflow.com/a/41862742/8501970 Check this out instead. Its about the same issue you have and this one is very simple and very few lines of codes. This sure helped me when i realized python3 cannot use simply get_page.

This is a fine alternative. (hope this helps, cheers!)

buda__
  • 41
  • 1
  • 5