3

I am attempting to build a download accelerator for Linux. My program utilizes gevent, os, and urllib2. My program receives a URL and attempts to download the file concurrently. All of my code is valid. My only problem is that urllib2.urlopen.read() is blocking me from running the .read() function concurrently.

This is the exception thats thrown at me.

Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/gevent/greenlet.py", line 405, in run
result = self._run(*self.args, **self.kwargs)
File "gevent_concurrent_downloader.py", line 94, in childTasklet
_tempRead = handle.read(divisor) # Read/Download part
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 407, in recv
wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event)
File "/usr/lib/pymodules/python2.7/gevent/socket.py", line 153, in wait_read
assert event.arg is None, 'This event is already used by another greenlet: %r' % (event.arg, )
AssertionError: This event is already used by another greenlet: (<Greenlet at 0x2304958: childTasklet(<__main__.NewFile object at 0x22c4390>, 4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 459551, 1)>, timeout('timed out',))
<Greenlet at 0x2304ea8: childTasklet(<__main__.NewFile object at 0x22c4390>,4595517, <addinfourl at 37154616 whose fp = <socket._fileob, 7, -1)failed with AssertionError

My program works by getting the file byte size from the URL by invoking:

urllib2.urlopen(URL).info().get("Content-Length") 

and dividing the file size by a divisor and thus breaking the download process into parts. In this example i am breaking the download into 10 parts.

Each greenlet runs a command in this fassion:

urllib2.urlopen(URL).read(offset)

Here's a link to my code hosted on pastie: http://pastie.org/3253705

Thank you for the help!

FYI: I am running on Ubuntu 11.10.

SuperA
  • 48
  • 5

2 Answers2

2

You're trying to read a response to a single request from different greenlets.

If you'd like to download the same file using several concurrent connections then you could use Range http header if the server supports it (you get 206 status instead of 200 for the request with Range header). See HTTPRangeHandler.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

the argument to read is a number of bytes, not an offset.

It seems gevent will let you call urllib asynchronously, but not let you access the same resource from multiple greenlets.

Furthermore, since it is using wait_read, the effect will still be a synchronous, sequential read from the file (The complete opposite of what you wanted to achieve).

I'd suggest you might need to go lower than, or use a different library from, urllib2.

Ivo
  • 5,378
  • 2
  • 18
  • 18