2

I am creating a web crawler using python and requests library. I want to make the crawler faster so i want to download only a part of html page. I have tried Range header in http request like this:

import requests
query = 'movie'
size = 10
start = 0
session = requests.Session()
google_url = 'https://216.58.208.36/search?q={}&num={}&start={}'.\
    format(query, size, offset)
response = self.session.get(google_url, verify=False, headers={'User-Agent': self.USER_AGENT,
                                                               'host': 'www.google.com',
                                                               'Range': 'bytes=0-100',
                                                               })
return response.text

But it did not work and downloaded the total html page. Is there any other way to do this?

hamid
  • 694
  • 1
  • 8
  • 20
  • Possible duplicate of [Only download a part of the document using python requests](https://stackoverflow.com/questions/23602412/only-download-a-part-of-the-document-using-python-requests) – Pitto Oct 09 '19 at 11:27
  • @Pitto thanks for reply. I tried that but did not work. I want another way. – hamid Oct 09 '19 at 11:30
  • Did you check also the 2nd answer in the link I provided? About the byte-range I read on the page I linked: "What if the byte-ranges are not supported by the server? This will fetch the entire content." and this seems to be your case. – Pitto Oct 09 '19 at 12:25
  • 1
    I have tried urllib3 with flag `preload_content=False` which seems works. from https://urllib3.readthedocs.io/en/latest/advanced-usage.html – hamid Oct 09 '19 at 12:41

0 Answers0