Two things are key to understanding this behaviour:
1) Order of iteration over a set
object
A set
is an unordered collection. The language makes no guarantee as to what order you get a set's elements in when you iterate over it. The implementation is generally such that within the same session, the same set will yield its elements in the same order; but the same set with the same elements could behave differently between different sessions.
For example, try this in a few interactive interpreter sessions:
print(list(set(('0', '1'))))
Sometimes you'll get ['0', '1']
, sometimes you'll get ['1', '0']
.
The difference arises because the set object internally arranges its items according to their hash, and Python applies hash seed randomisation so that the "same" object will have a different hash in different sessions.
2) How x in y
behaves for an iterator y
When you do x in y
, Python will iterate through y
until it finds element x
. (If x
isn't present, it will entirely exhaust the iterator.)
If y
is an iterator that continues to exist after x in y
, this will affect what the iterator subsequently yields.
For example:
def gen_func():
for i in range(100):
yield i*i
gen_obj = gen_func()
res1 = 16 in gen_obj
print(res1) # True
print(next(gen_obj)) # 25
res2 = 16 in gen_obj
print(res2) # False: 16 is never yielded again
You don't see this when doing for example x in some_list
or x in some_set
, because in those cases, you're not getting a reference to an iterator over the collection and retaining it after the in
operation.
But when you're subsequently using the iterator (a generator object, an iterator explicitly obtained through iter(my_obj)
, a map
object, one of various itertools
objects...), the iterator will yield only the elements that come after x
.
Putting it together
We can see how these factors interact to give the behaviour you see by considering the case where you've called generate_name
twice already, so that generated_names
is {'0', '1'}
.
If iterating over generated_names
happens to yield '0'
first, then the first iteration of the while
loop increments name_int
to 1
, having consumed only 0
from the map
object. In the second iteration of the while
loop, the map
object yields 1
, so the loop body is entered, and name_int
is incremented to 2
. The function returns '2'
.
However, if iterating over generated_names
yields '1'
first, then the first iteration of the while
loop still increments name_int
to 1, but consumes both 1
and 0
from the map
object in doing so. In the second iteration of the while
loop, the map
object has nothing left to yield, the loop condition is false, so the loop body is not entered and name_int
remains at 1. The function returns '1'
.
Solutions
Your question was about understanding the behaviour rather than fixing it. But a quicker solution here, rather than converting every element of generated_names
to int on every iteration of the loop, is simply to check whether str(name_int)
is a member of generated_names
:
generated_names = set(["0"])
def generate_name()->str:
name_int = 0
while str(name_int) in generated_names:
name_int += 1
name = str(name_int)
generated_names.add(name)
return name
This will reliably give the behaviour you're looking for.
(Obviously in this simple example one could just write a generator that increments an int every time and returns it converted to str: but I assume that your initial state might not actually be {'0'}
, but rather some messier set with gaps that you want to fill.)