What is the quickest way to HTTP GET in Python?

Question

What is the quickest way to HTTP GET in Python if I know the content will be a string? I am searching the documentation for a quick one-liner like:

contents = url.get("http://example.com/foo/bar")

But all I can find using Google are httplib and urllib - and I am unable to find a shortcut in those libraries.

Does standard Python 2.5 have a shortcut in some form as above, or should I write a function url_get?

I would prefer not to capture the output of shelling out to wget or curl.

I thought I would pass this along, as it had me stumped for hours. I tried getting the text that visually appeared in the browser, but instead got snippets of a web app. The solution was to go into the browser Developer Tools, click on the Network tab, and reload the page. In the list of files that came over the network, I could see the text file I wanted. I could right-click on it and "Open in new tab" to verify. — Rick Shory, Jun 28 '22 at 16:19

score 984 · Accepted Answer · edited May 17 '19 at 06:13

984

Python 3:

import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()

Python 2:

import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()

Documentation for urllib.request and read.

edited May 17 '19 at 06:13

Boris Verkhovskiy

14,854
11
100
103

answered Mar 14 '09 at 03:48

Nick Presta

28,134
6
57
76

52

Does everything get cleaned up nicely? It looks like I should call `close` after your `read`. Is that necessary? – Frank Krueger Mar 14 '09 at 03:49
4

It is good practice to close it, but if you're looking for a quick one-liner, you could omit it. :-) – Nick Presta Mar 14 '09 at 03:51
For what it's worth, the same thing works with urllib in place of urllib2 (at least for most URLs). – David Z Mar 14 '09 at 04:09
33

The object returned by urlopen will be deleted (and finalized, which closes it) when it falls out of scope. Because Cpython is reference-counted, you can rely on that happening immediately after the `read`. But a `with` block would be clearer and safer for Jython, etc. – sah Dec 27 '13 at 21:05
17

It doesn't work with HTTPS-only websites. `requests` works fine – OverCoder Jul 16 '16 at 00:45
6

If you're using **Amazon Lambda** and need to get a URL, the 2.x solution is available and built-in. It does seem to work with https as well. It's nothing more than `r = urllib2.urlopen("http://blah.com/blah")` and then `text = r.read()`. It is sync, it just waits for the result in "text". – Fattie Dec 11 '16 at 18:24
2

What are the pros / cons compared to `requests`? – Martin Thoma Sep 18 '17 at 10:10
3

See [should I call close() after urllib.urlopen()?](https://stackoverflow.com/q/1522636/52677510) for the explanation in details. – user202729 Aug 21 '18 at 15:45
@sah Consider posting that on the linked question above if you think it's useful. – user202729 Aug 21 '18 at 15:48
1

Technically 'contents' is a 'bytes' object, not a 'str'. You need the right encoding to convert it into a str; this is challenging to extract from the content. I've tried assuming utf-8 (eg contents = str(contents, 'utf-8'), but hit errors on web pages that claim they're utf-8. So this remains confusing/ambiguous... But for most cases we can treat the bytes as a string, so I guess that's good enough. I tried contents = str(contents) too, but this has the disadvantage of turning '\n' characters into separate slashes and ns. ('\\n'). – Apollo Grace Aug 13 '20 at 09:10
@ApolloGrace you can use the `errors` parameter for decode (ignore or replace might be helpful). See https://docs.python.org/3/library/stdtypes.html#bytes.decode – nijave May 18 '21 at 00:57

score 496 · Answer 2 · edited Apr 03 '22 at 12:07

496

Use the Requests library:

import requests
r = requests.get("http://example.com/foo/bar")

Then you can do stuff like this:

>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)  # bytes
>>> print(r.text)     # r.content as str

Install Requests by running this command:

pip install requests

edited Apr 03 '22 at 12:07

Boris Verkhovskiy

14,854
11
100
103

answered Apr 08 '13 at 01:30

1

Almost any Python library can be used in AWS Lambda. For pure Python, you just need to "vendor" that library (copy into your module's folders rather than using `pip install`). For non-pure libraries, there's an extra step -- you need to `pip install` the lib onto an instance of AWS Linux (the same OS variant lambdas run under), then copy those files instead so you'll have binary compatibility with AWS Linux. The only libraries you won't always be able to use in Lambda are those with binary distributions only, which are thankfully pretty rare. – Chris Johnson Sep 29 '17 at 10:27
8

@lawphotog this DOES work with python3, but you have to `pip install requests`. – akarilimano Feb 01 '18 at 10:29
2

Even the urllib2 standard library recommends requests – Asfand Qazi Jan 31 '19 at 11:54
1

In regards to Lambda: if you do wish to use requests in AWS Lambda functions. There is a preinstalled boto3 requests library also. `from botocore.vendored import requests` Usage `response = requests.get('...')` – kmjb Aug 21 '19 at 11:02
@kmjb borrowing requests from botocore has been deprecated https://aws.amazon.com/blogs/developer/removing-the-vendored-version-of-requests-from-botocore/ and--imo--it's a bad idea to rely on indirect dependencies – nijave May 18 '21 at 01:05

score 31 · Answer 3 · edited Aug 14 '18 at 13:34

31

If you want solution with httplib2 to be oneliner consider instantiating anonymous Http object

import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")

edited Aug 14 '18 at 13:34

Manos Nikolaidis

21,608
12
74
82

answered Mar 14 '09 at 16:40

to-chomik

371
2
3

score 20 · Answer 4 · edited Sep 29 '17 at 10:12

Have a look at httplib2, which - next to a lot of very useful features - provides exactly what you want.

import httplib2

resp, content = httplib2.Http().request("http://example.com/foo/bar")

Where content would be the response body (as a string), and resp would contain the status and response headers.

It doesn't come included with a standard python install though (but it only requires standard python), but it's definitely worth checking out.

score 17 · Answer 5 · 2020-02-22T00:51:54.327

It's simple enough with the powerful urllib3 library.

Import it like this:

import urllib3

http = urllib3.PoolManager()

And make a request like this:

response = http.request('GET', 'https://example.com')

print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.

You can add headers too:

response = http.request('GET', 'https://example.com', headers={
    'key1': 'value1',
    'key2': 'value2'
})

More info can be found on the urllib3 documentation.

urllib3 is much safer and easier to use than the builtin urllib.request or http modules and is stable.

great for the fact you can easily provide an HTTP verb – Tom Apr 05 '19 at 17:45 — Tom, Apr 05 '19 at 17:45

score 14 · Answer 6 · edited Oct 12 '20 at 09:24

14

Actually in Python we can read from HTTP responses like from files, here is an example for reading JSON from an API.

import json
from urllib.request import urlopen

with urlopen(url) as f:
    resp = json.load(f)

return resp['some_key']

edited Oct 12 '20 at 09:24

greatvovan

2,439
23
43

answered Dec 10 '19 at 12:13

Katrych Taras

141
1
3

Though we thank you for your answer, it would be better if it provided additional value on top of the other answers. In this case, your answer does not provide additional value, since another user already posted that solution. If a previous answer was helpful to you, you should vote it up instead of repeating the same information. – Toby Speight Dec 10 '19 at 12:32
5

This is an old request/answer but I found value in this because it has the elegant `with...` syntax that I could just grab. – rich p Dec 11 '20 at 17:17
2

This question adds value as it uses the with construct which is much discussed in the comments on the top-voted and accepted answer, yet lacking from it. – scravy Oct 05 '21 at 08:14

score 9 · Answer 7 · answered Jan 01 '18 at 15:11

Without further necessary imports this solution works (for me) - also with https:

try:
    import urllib2 as urlreq # Python 2.x
except:
    import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()

I often have difficulty grabbing the content when not specifying a "User-Agent" in the header information. Then usually the requests are cancelled with something like: urllib2.HTTPError: HTTP Error 403: Forbidden or urllib.error.HTTPError: HTTP Error 403: Forbidden.

Unexpectedly, the 'User-Agent' for Microsoft Edge really is something like `Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10136` according to https://stackoverflow.com/questions/30591706/what-is-the-user-agent-string-name-for-microsoft-edge. Not sure how to find out the most recent `Microsoft Edge UA string`, but the answer here rightly hints at the way to solve it. — questionto42, Oct 25 '20 at 18:40

score 7 · Answer 8 · answered Sep 16 '18 at 06:22

How to also send headers

Python 3:

import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
    "https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
    headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

Python 2:

import urllib2
contents = urllib2.urlopen(urllib2.Request(
    "https://api.github.com",
    headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

score 6 · Answer 9 · answered Jan 05 '10 at 01:21

theller's solution for wget is really useful, however, i found it does not print out the progress throughout the downloading process. It's perfect if you add one line after the print statement in reporthook.

import sys, urllib

def reporthook(a, b, c):
    print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
    sys.stdout.flush()
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)
print

score 5 · Answer 10 · answered Mar 14 '09 at 16:47

Here is a wget script in Python:

# From python cookbook, 2nd edition, page 487
import sys, urllib

def reporthook(a, b, c):
    print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)
print

score 4 · Answer 11 · answered Mar 06 '20 at 16:26

If you want a lower level API:

import http.client

conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')

resp = conn.getresponse()
content = resp.read()

conn.close()

text = content.decode('utf-8')

print(text)

score 3 · Answer 12 · answered Jun 24 '15 at 14:18

Excellent solutions Xuan, Theller.

For it to work with python 3 make the following changes

import sys, urllib.request

def reporthook(a, b, c):
    print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
    sys.stdout.flush()
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print (url, "->", file)
    urllib.request.urlretrieve(url, file, reporthook)
print

Also, the URL you enter should be preceded by a "http://", otherwise it returns a unknown url type error.

Kimmo · Answer 13 · 2014-05-22T17:15:49.633

2

If you are working with HTTP APIs specifically, there are also more convenient choices such as Nap.

For example, here's how to get gists from Github since May 1st 2014:

from nap.url import Url
api = Url('https://api.github.com')

gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())

More examples: https://github.com/kimmobrunfeldt/nap#examples

edited May 22 '14 at 17:15

answered May 22 '14 at 17:08

Kimmo

1,886
1
21
28

You should mention that you are the author of this library. – Boris Verkhovskiy Apr 03 '22 at 12:20

score 1 · Answer 14 · answered Feb 29 '20 at 23:02

1

For python >= 3.6, you can use dload:

import dload
t = dload.text(url)

For json:

j = dload.json(url)

Install:
pip install dload

answered Feb 29 '20 at 23:02

Pedro Lobito

94,083
31
258
268

The OP wanted to make a GET request WITHOUT using a library, while this solution requires you to install a package using pip and import the library. – Yılmaz Alpaslan Mar 18 '22 at 20:53
@YılmazAlpaslan OP asked for no such thing, that was an edit someone made to the title of the question that I have rolled back. The actual problem with this answer is it's recommending some weird library that no one is using. – Boris Verkhovskiy Apr 03 '22 at 12:19
As far a I understood, the op asked for the "_quickest way to HTTP GET in Python_" , based on that, you can use the `dload` library, even if not many users use it, something that's not a requirement for an answer. Just a guess, but I don't think you understood the question properly, but reading other answers may giving you clue because many different libraries are also recommended. – Pedro Lobito Apr 03 '22 at 16:15

What is the quickest way to HTTP GET in Python?

14 Answers14

Linked

Related