1

I want to implement a download speed limit with urllib/urllib2. The basic idea is to look at how much was downloaded in the past x seconds and if it is above the limit, the script simply sleeps for some time.

Now the question is, what happens if you have an open connection (with urlopen()), but don't call the read() function for a while?

  1. Does urllib have a built-in buffer that downloads until the buffer is full and every time you call read(), the buffer is reduced by n bytes and the download continues (and obviously if it is full, urllib waits)?
  2. If there is a urllib buffer, how big is it, and can one set the size manually?
  3. If there is no buffer, does urllib simply continue to download?
  4. Is there a difference between urllib and urllib2 read() function or are they the same?
Divins Mathew
  • 2,908
  • 4
  • 22
  • 34
goocreations
  • 2,938
  • 8
  • 37
  • 59

1 Answers1

0

I wholeheartedly recommend using the excellent requests library instead of urllib or urllib2.

See here and here for how others went about implementing transfer rate limiting.

Community
  • 1
  • 1
rtkaleta
  • 681
  • 6
  • 14
  • Thanks, I've looked at requests and the links you sent. But even with requests, the question remains what will happen if request.iter_content(1024) is not called for a while, does requests continue downloading or pausing? – goocreations Jan 15 '17 at 14:37
  • 1
    I think when you use `requests`' streaming, the download pauses until you ask for the next chunk. The reason I think so is because I once used this trick to limit the amount of memory a large downloading was using. And when you set `stream=True` in the call to `requests.get` then your connection will not get returned to the pool until all the data was consumed, or you close it yourself. See if this [guide](http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow) helps. And don't forget to `if chunk is not None: #do stuff` - chunks that are `None` are just keep-alives. – rtkaleta Jan 15 '17 at 18:36
  • Thanks for the update. I'll do a network analysis to see how much data my scripts consumes, just to make sure that request actually pauses. – goocreations Jan 16 '17 at 06:22
  • Cool, let me know your findings! – rtkaleta Jan 16 '17 at 12:14