In this question "Generating an MD5 checksum of a file", I had this code:
import hashlib
def hashfile(afile, hasher, blocksize=65536):
buf = afile.read(blocksize)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(blocksize)
return hasher.digest()
[(fname, hashfile(open(fname, 'rb'), hashlib.sha256())) for fname in fnamelst]
I was criticized for opening a file inside of a list comprehension, and one person opined that if I had a long enough list I would run out of open file handles. Interfaces which significantly reduced hashfile
's flexibility and had hashfile taking a filename argument and using with
were suggested.
Were these necessary? Was I really doing something that wrong?
Testing out this code:
#!/usr/bin/python3
import sys
from pprint import pprint # Pretty printing
class HereAndGone(object):
def __init__(self, i):
print("%d %x -> coming into existence." % (i, id(self)),
file=sys.stderr)
self.i_ = i
def __del__(self):
print("%d %x <- going away now." % (self.i_, id(self)),
file=sys.stderr)
def do_nothing(hag):
return id(hag)
l = [(i, do_nothing(HereAndGone(i))) for i in range(0, 10)]
pprint(l)
results in this output:
0 7f0346decef0 -> coming into existence.
0 7f0346decef0 <- going away now.
1 7f0346decef0 -> coming into existence.
1 7f0346decef0 <- going away now.
2 7f0346decef0 -> coming into existence.
2 7f0346decef0 <- going away now.
3 7f0346decef0 -> coming into existence.
3 7f0346decef0 <- going away now.
4 7f0346decef0 -> coming into existence.
4 7f0346decef0 <- going away now.
5 7f0346decef0 -> coming into existence.
5 7f0346decef0 <- going away now.
6 7f0346decef0 -> coming into existence.
6 7f0346decef0 <- going away now.
7 7f0346decef0 -> coming into existence.
7 7f0346decef0 <- going away now.
8 7f0346decef0 -> coming into existence.
8 7f0346decef0 <- going away now.
9 7f0346decef0 -> coming into existence.
9 7f0346decef0 <- going away now.
[(0, 139652050636528),
(1, 139652050636528),
(2, 139652050636528),
(3, 139652050636528),
(4, 139652050636528),
(5, 139652050636528),
(6, 139652050636528),
(7, 139652050636528),
(8, 139652050636528),
(9, 139652050636528)]
It's obvious that each HereAndGone
object is being created and destroyed as each element of the list comprehension is constructed. Python reference counting frees the object as soon as there are no references to it, which happens immediately after the value for that list element is computed.
Of course, maybe some other Python implementations don't do this. Is it required for a Python implementation to do some form of reference counting? It certainly seems from the documentation of the gc
module like reference counting is a core feature of the language.
And, if I did do something wrong, how would you suggest I re-write it to retain the succinct clarity of a list comprehension and the flexibility of an interface that works with anything that can be read like a file?