75

I'm new to Python and reading someone else's code:

should urllib.urlopen() be followed by urllib.close()? Otherwise, one would leak connections, correct?

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
Nikita
  • 6,019
  • 8
  • 45
  • 54

5 Answers5

109

The close method must be called on the result of urllib.urlopen, not on the urllib module itself as you're thinking about (as you mention urllib.close -- which doesn't exist).

The best approach: instead of x = urllib.urlopen(u) etc, use:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
   ...use x at will here...

The with statement, and the closing context manager, will ensure proper closure even in presence of exceptions.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 11
    what about doing something like `data = urllib2.urlopen('url').read()` – Facundo Casco Oct 09 '11 at 00:43
  • 21
    In Python 3, direct support for the with statement was added. with urllib.urlopen(u) as x: ... – merwok Mar 13 '12 at 14:25
  • How come the [python3 doc](https://docs.python.org/3/library/contextlib.html) still mentions `contextlib.closing` in this (ahem) context? – user66081 Dec 20 '18 at 02:28
  • @ÉricAraujo: In python 3, `urllib.urlopen` doesn't exist at all. – Eric Oct 13 '19 at 22:40
  • It was moved to a new submodule urllib.request: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen – merwok Oct 19 '19 at 02:34
13

Like @Peter says, out-of-scope opened URLs will become eligible for garbage collection.

However, also note that in CPython URLopener defines:

 def __del__(self):
     self.close()

This means that when the reference count for that instance reaches zero, its __del__ method will be called, and thus its close method will be called as well. The most "normal" way for the reference count to reach zero is to simply let the instance go out of scope, but there's nothing strictly stopping you from an explicit del x early (however it doesn’t directly call __del__ but just decrements the reference count by one).

It's certainly good style to explicitly close your resources -- especially when your application runs the risk of using too much of said resources -- but Python will automatically clean up for you if you don't do anything funny like maintaining (circular?) references to instances that you don't need any more.

ruohola
  • 21,987
  • 6
  • 62
  • 97
Mark Rushakoff
  • 249,864
  • 45
  • 407
  • 398
  • 3
    It's possible, however, to overrun the garbage collector -- I've had cases where I'm creating file handles faster than it closes them [but where an explicit `gc.collect()` call, or a `close()`, cleaned things up]. – Charles Duffy Apr 25 '12 at 01:23
6

Strictly speaking, this is true. But in practice, once (if) urllib goes out of scope, the connection will be closed by the automatic garbage collector.

Peter
  • 127,331
  • 53
  • 180
  • 211
  • 12
    This is true of some implementations of Python, but the Python language does not guarantee that the closing will happen as soon as the object goes out of scope. cf. jython – John La Rooy Oct 05 '09 at 23:05
  • 1
    @gnibbler The author of this answer doesn't state it will happen *as soon as* only that it will happen. – Piotr Dobrogost Apr 26 '12 at 17:12
  • 3
    @Piotr, but maybe the program crashes if I have a loop opening urls and the GC isn't reaping them fast enough. It's a pretty sloppy way to do things and doesn't belong in production code. – John La Rooy Apr 26 '12 at 22:11
  • 2
    The no-op GC (i.e., a GC that never, ever runs) is perfectly valid for Python. You have no guarantee the GC will ever run. And `gc.disable` can disable the GC in most Python implementations. – gsnedders Aug 27 '13 at 12:52
  • 1
    I managed to run out of available connections before GC went and did any cleanup. So yes you should call close if you don't want a sudden hard to find loss of connectivity. – andrew pate Aug 15 '14 at 12:17
1

You basically do need to explicitly close your connection when using IronPython. The automatic closing on going out of scope relies on the garbage collection. I ran into a situation where the garbage collection did not run for so long that Windows ran out of sockets. I was polling a webserver at high frequency (i.e. as high as IronPython and the connection would allow, ~7Hz). I could see the "established connections" (i.e. sockets in use) go up and up on PerfMon. The solution was to call gc.collect() after every call to urlopen.

Jann Poppinga
  • 444
  • 4
  • 18
0

urllib.request module uses HTTP/1.1 and includes Connection:close header in its HTTP requests.

It's from official docs, you can check it here.

André Carvalho
  • 111
  • 1
  • 11