9

I'm using urllib2 to load files from ftp- and http-servers.

Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.

from urllib2 import urlopen
from time import sleep

url = 'ftp://user:pass@host/big_file.ext'

def load_file(url):
    f = urlopen(url)
    loaded = 0
    while True:
        data = f.read(1024)
        if data == '':
            break
        loaded += len(data)
    f.close()
    #sleep(1)
    print('loaded {0}'.format(loaded))

load_file(url)
load_file(url)

The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:

loaded 463675266
Traceback (most recent call last):
  File "conection_test.py", line 20, in <module>
    load_file(url)
  File "conection_test.py", line 7, in load_file
    f = urlopen(url)
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
    fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
  File "/usr/lib/python2.6/urllib.py", line 854, in __init__
    self.init()
  File "/usr/lib/python2.6/urllib.py", line 860, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/usr/lib/python2.6/ftplib.py", line 134, in connect
    self.welcome = self.getresp()
  File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
    raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>

So the first file is loaded and the second fails because the first connection was not closed.

But when i use sleep(1) after f.close() the error does not occurr:

loaded 463675266
loaded 463675266

Is there any way to force close the connection so that the second download would not fail?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Biggie
  • 7,037
  • 10
  • 33
  • 42
  • 1
    possible duplicate of [should I call close() after urllib.urlopen()?](http://stackoverflow.com/questions/1522636/should-i-call-close-after-urllib-urlopen) – moinudin Mar 26 '11 at 12:40
  • @marcog I don't think that this is the same question :-) The user of the other thread asked whether he should close the "connection". I know that i should close the connection (and i will close it :-)), but as mentioned above the connection is not closed immediately when using `close()` ... or `contextlib.closing` (which calls `close`). – Biggie Mar 26 '11 at 12:47
  • Okay sorry, my bad. I would take the vote back if I could. – moinudin Mar 26 '11 at 12:58

4 Answers4

4

The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython. A colleague proposed this sollution:

 

    fdurl = urllib2.urlopen(req,timeout=self.timeout)
    realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later 
    req = urllib2.Request(url, header)
    try:
             fdurl = urllib2.urlopen(req,timeout=self.timeout)
    except urllib2.URLError,e:
              print "urlopen exception", e
    realsock.close() 
    fdurl.close()

The fix is ugly, but does the job, no more "too many open connections".

Nutiu Lucian
  • 169
  • 2
  • 5
  • 1
    is there a good reason why `urlopen` is called twice? And why `req` is used before it's assigned? – drevicko Mar 14 '14 at 12:05
3

Biggie: I think it's because the connection is not shutdown().

Note close() releases the resource associated with a connection but does not necessarily close the connection immediately. If you want to close the connection in a timely fashion, call shutdown() before close().

You could try something like this before f.close():

import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)

(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)

wjd
  • 31
  • 1
3

as for Python 2.7.1 urllib2 indeed leaks a file descriptor: https://bugs.pypy.org/issue867

vak
  • 1,694
  • 14
  • 18
0

Alex Martelli answers to the similar question. Read this : should I call close() after urllib.urlopen()?

In a nutshell:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
    # ...
Community
  • 1
  • 1
Sandro Munda
  • 39,921
  • 24
  • 98
  • 123
  • 4
    As you can see [here](http://docs.python.org/library/contextlib.html#contextlib.closing) the `contextlib.closing` just uses `close()`. That's what i do manually in the code above, too. So the problem still exist, that the second download will fail because the first connection is not closed instantly using `close()`. – Biggie Mar 26 '11 at 12:42
  • Hmmm I see, sorry for my answer. I'll keep you informed If I manage to solve the problem. – Sandro Munda Mar 26 '11 at 12:48