9

According to this answer I can use the Range header to download only a part of an html page, but with this code:

import requests

url = "http://stackoverflow.com"
headers = {"Range": "bytes=0-100"}  # first 100 bytes

r = requests.get(url, headers=headers)

print r.text

I get the whole html page. Why isn't it working?

Community
  • 1
  • 1
Hyperion
  • 2,515
  • 11
  • 37
  • 59
  • 2
    `What if the byte-ranges are not supported by the server? This will fetch the entire content` - according to the comment in the page you linked to – Danny Cullen Nov 19 '16 at 10:54

3 Answers3

4

If the webserver does not support Range header, it will be ignored.

Try with other server that support the header, for example tools.ietf.org:

import requests

url = "http://tools.ietf.org/rfc/rfc2822.txt"
headers = {"Range": "bytes=0-100"}
r = requests.get(url, headers=headers)
assert len(r.text) <= 101  # not exactly 101, because r.text does not include header
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Requests package great, here is working fine too https://stackoverflow.com/questions/23602412/only-download-a-part-of-the-document-using-python-requests?answertab=oldest#tab-top – Tejas Tank Jun 09 '17 at 03:59
0

I'm having the same problem. The server I'm downloading from supports the Range header. Using requests, the header is ignored and the entire file is downloaded with a 200 status code. Meanwhile, sending the request via urllib3 correctly returns the partial content with a 206 status code.

I suppose this must be some kind of bug or incompatibility. requests works fine with other files, including the one in the example below. Accessing my file requires basic authorization - perhaps that has something to do with it?

If you run into this, urllib3 may be worth trying. You'll already have it because requests uses it. This is how I worked around my problem:

import urllib3

url = "https://www.rfc-editor.org/rfc/rfc2822.txt"
http = urllib3.PoolManager()
response = http.request('GET', url, headers={'Range':'bytes=0-100'})

Update: I tried sending a Range header to https://stackoverflow.com/, which is the link you specified. This returns the entire content with both Python modules as well as curl, despite the response header specifying accept-ranges: bytes. I can't say why.

Stefan
  • 70
  • 2
  • 5
-5

I tried it without using:

headers = {"Range": "bytes=0-100"} 

Try to use this:

import requests

# you can change the url
url = requests.get('http://example.com/')

print(url.text)
  • 6
    Please don't post only code as an answer, but also provide an explanation of what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes – Ran Marciano Jan 31 '21 at 06:53