0

I am unable to fix A requests.get issue when the url hit has content woth 50~ MBs being rendered. The browser all in all takes 4mins~ to get/display the complete response, while the requests.get() keeps running for an eternity.

response = requests.get('http://<url-that-renders-contents-as-raw-data>', headers=<headers>, cookies=<cookies>, verify=False, stream=True)

due to some privacy issues cant share the actual URL / headers/cookies etc other params, but how do we fetch the response.content, maybe in chunks, of a URL that when hit as a get request gets raw data/log/rows worth MBs?

edit: Actually, it is a SimpleHTTPServer, or a SimpleAuthServer, and I need to get a bulky file over http from it.

khanna
  • 718
  • 10
  • 24
  • 2
    Does this answer your question? [Download large file in python with requests](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) – congbaoguier Jan 14 '20 at 15:09
  • hey @congbaoguier, not really - actually this is a partial content problem , even the browser as it loads, is showing this status_code of 206. – khanna Jan 15 '20 at 06:19
  • I am looking up if some websockets need to be established and keep reading this 40MB of data – khanna Jan 15 '20 at 06:38
  • Actually, it is a SimpleHTTPServer, or a SimpleAuthServer, and I need to get a bulky file over http from it. – khanna Jan 15 '20 at 07:01

2 Answers2

0

How do you actually get the response content here? Since you've set stream=True it's just going to download the headers following which it's going to wait for you to get the actual data using Response.iter_lines, Response.iter_content or performing direct IO on the Response.raw output stream.

Hard to help without more information, but since all of these should be given "bounds" you can look at the progress of your reading and see if it's completely locked up. Or if you don't even reach that part (at which point you may want to enable low-level logging of http.client and urllib3, it's extremely noisy but will provide more insight)

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • stream=True may be something redundant. well good point though I may try with removing it once. – khanna Jan 15 '20 at 07:14
  • If you're fetching lots of data you really want stream=True: without it, Request will first try to load all data in memory upfront. With stream=True on the other hand you have more control and can "chunk" the read and better observe. – Masklinn Jan 15 '20 at 07:16
  • far as I now see, it is basically a "SimpleAuthServer", and somewhere I am just trying to get a heavy file say 50MBs (it is a .dat file). I as well am exploring various options to get this done. – khanna Jan 15 '20 at 07:19
0

My sincere apologies, I as well has been working with very minimal information from the task assignee - actually the URL itself isnt accessible from the instance. the ways that worked for me finally have been what @congbauguier suggested as : Download large file in python with requests

khanna
  • 718
  • 10
  • 24