I want to know the number of items that a generator has generated.
I'm trying to do this by using the output of enumerate
to set a global variable. It works on simple tests but goes wrong once I try to adapt the technique to my real application case.
The following script tests first a generator based on an iteration over the lines of a file, then a generator based on the parsing of a file using a bioinformatics library I want to use:
#!/usr/bin/env python3
def test1(delete=False):
# I have to comment the following otherwise I get:
# $ ./test.py
# Traceback (most recent call last):
# File "./test.py", line 60, in <module>
# test1()
# File "./test.py", line 31, in test1
# print(nb_things)
# UnboundLocalError: local variable 'nb_things' referenced before assignment
# if delete:
# try:
# del nb_things
# print("deleted nb_things")
# except NameError:
# pass
with open("test.py") as this_file:
def my_gen():
for i, thing in enumerate(this_file, start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
import pysam
def test2(delete=False):
if delete:
try:
del nb_things
print("deleted nb_things")
except NameError:
pass
with pysam.AlignmentFile("/path/to/a/bam/file", "rb") as bamfile:
def my_gen():
for i, thing in enumerate(bamfile.fetch(), start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
if __name__ == "__main__":
test1()
print("end of test 1")
test2()
print("end of test 2")
(As you can see in the comment in the above script, very strange things happen if I include code that mention my global variable without even being executed.)
When I execute the above code, the first test succeeds, but not the second, despite a very similar code structure:
$ ./test.py
63
end of test 1
Traceback (most recent call last):
File "./test.py", line 62, in <module>
test2()
File "./test.py", line 53, in test2
for _ in g:
File "./test.py", line 49, in my_gen
nb_things = i
UnboundLocalError: local variable 'i' referenced before assignment
My main question is:
Why does the enumeration counter still exist after the end of the for loop in the first case and not in the second?
I suspect that this has to do with the way the iteration is stopped. In the second case the generator somehow causes the enumerate result to cease to exist after the internal iterator gets stops.
What could cause such a difference?
A second question that occurred to me while designing the above test script is the following:
Why is the global variable nb_things
considered local if I put code referencing it but not even executed? (note the delete=False
, and the absence of a message mentioning the deletion)
I'm using python 3.6 and pysam version 0.10.0.
For an earlier version of the real code (but the essential approach is there), and clues regarding why I ended up defining my generator in the main function, see this question. (Essentially, the reason is that the generator actually uses a function that is defined depending on command-line options.)