0

The following function was tested in the most recent version of python and sometimes returns 0, 1, 2, or 3 athousand times but most of the time it returns 1 a thousand times.

generated_names = set(["0"])

def generate_name()->str:
        global generated_names

        name_int = 0
        
        already_generated_names_int = map(int, generated_names)

        while name_int in already_generated_names_int:
            name_int +=1

        name = str(name_int)

        generated_names.add(name)
        return name

for i in range(1000):
    print(generate_name())

Specifically the thing seems to be non deterministic about it is the "in" command on a map object (name_int in already_generated_names_int). Shouldn't the "in" command work the same on map objects as any other iterable?

It works if you cast the map to a set, outputing 1, 2, 3, 4... etc. But why doesn't it work for just maps?

Animated
  • 1
  • 1
  • 3
    Do you know what `map` returns / what `map` does? – luk2302 Aug 21 '23 at 14:32
  • 3
    `map()` returns a generator. You can only reliably use `in` on it *once*, because elements will have been consumed from the generator (up until the matching element is found), and won't be re-examined if you use `in` again. The call to `map()` is completely pointless, anyway - if you stored `generated_names` as a set of ints instead of strings, you could directly test it using `in`. – jasonharper Aug 21 '23 at 14:34
  • 3
    @jasonharper map does not return a generator. It is an iterator. Generators need to have a send method, and they need to deal with exceptions in a specific way that map doesn't. – wim Aug 21 '23 at 14:35
  • @Animated doing it this way looks like a huge waste of resources. Why don't use store just the max id? – Marat Aug 21 '23 at 14:38
  • (`map` returns an iterator, but not a generator. Generators are a very specific type of iterator. Iterators implemented at Python level are almost always implemented as generators, while iterators implemented in C are never generators.) – user2357112 Aug 21 '23 at 14:39
  • The thing that's nondeterministic here is the order of iteration over a set. If `generated_names` is `{'0', '1'}` there's no guarantee as to which item will be yielded first: there's nothing really special about `map` here compared to other ways of iterating. – slothrop Aug 21 '23 at 14:47
  • @slothrop Why would the order matter when all you're doing is checking whether it contains an item? The problem here is that `name_int in already_generated_names_int` always returns false the second time around and that very much is special to `map` (or at least to iterators as opposed to lists and sets). – sepp2k Aug 21 '23 at 15:48
  • @sepp2k *"always returns false the second time around"* - How do you explain that they got 2 or 3? – Kelly Bundy Aug 21 '23 at 15:59
  • @KellyBundy You're right, it's not always false the second time around, but rather the second time around it doesn't contain any elements that it already checked the first time. Either way, the problem is that iterators consume elements. It would work fine with a set. – sepp2k Aug 21 '23 at 16:03
  • @sepp2k Order matters, because it affects how many (if any) items remain to be yielded on subsequent iterations. – slothrop Aug 21 '23 at 16:38
  • @sepp2k *"at least to iterators as opposed to lists and sets"* is correct, but OP's question was specifically about *"non deterministic [behaviour of the 'in´ command on a map object"* If we're looking at why the function behaves nondeterministically (as opposed to just contrary to what was intended) then the set object is the key to it. To be fair, I should have signposted my comment more clearly as an explanation of the behaviour being **nondeterministic** rather than being **wrong**. – slothrop Aug 21 '23 at 16:44
  • @slothrop The set object actually does behave deterministically. – Kelly Bundy Aug 21 '23 at 16:54
  • Fair enough - to be more precise then, for some given sequence `strings` of strings, the iteration order of `set(strings)` will vary from session to session, because of hash randomisation. – slothrop Aug 21 '23 at 17:02

1 Answers1

3

Two things are key to understanding this behaviour:

1) Order of iteration over a set object

A set is an unordered collection. The language makes no guarantee as to what order you get a set's elements in when you iterate over it. The implementation is generally such that within the same session, the same set will yield its elements in the same order; but the same set with the same elements could behave differently between different sessions.

For example, try this in a few interactive interpreter sessions:

print(list(set(('0', '1'))))

Sometimes you'll get ['0', '1'], sometimes you'll get ['1', '0'].

The difference arises because the set object internally arranges its items according to their hash, and Python applies hash seed randomisation so that the "same" object will have a different hash in different sessions.

2) How x in y behaves for an iterator y

When you do x in y, Python will iterate through y until it finds element x. (If x isn't present, it will entirely exhaust the iterator.)

If y is an iterator that continues to exist after x in y, this will affect what the iterator subsequently yields.

For example:

def gen_func():
    for i in range(100):
        yield i*i

gen_obj = gen_func()
res1 = 16 in gen_obj
print(res1)           # True
print(next(gen_obj))  # 25

res2 = 16 in gen_obj
print(res2)           # False: 16 is never yielded again

You don't see this when doing for example x in some_list or x in some_set, because in those cases, you're not getting a reference to an iterator over the collection and retaining it after the in operation.

But when you're subsequently using the iterator (a generator object, an iterator explicitly obtained through iter(my_obj), a map object, one of various itertools objects...), the iterator will yield only the elements that come after x.

Putting it together

We can see how these factors interact to give the behaviour you see by considering the case where you've called generate_name twice already, so that generated_names is {'0', '1'}.

  1. If iterating over generated_names happens to yield '0' first, then the first iteration of the while loop increments name_int to 1, having consumed only 0 from the map object. In the second iteration of the while loop, the map object yields 1, so the loop body is entered, and name_int is incremented to 2. The function returns '2'.

  2. However, if iterating over generated_names yields '1' first, then the first iteration of the while loop still increments name_int to 1, but consumes both 1 and 0 from the map object in doing so. In the second iteration of the while loop, the map object has nothing left to yield, the loop condition is false, so the loop body is not entered and name_int remains at 1. The function returns '1'.

Solutions

Your question was about understanding the behaviour rather than fixing it. But a quicker solution here, rather than converting every element of generated_names to int on every iteration of the loop, is simply to check whether str(name_int) is a member of generated_names:

generated_names = set(["0"])

def generate_name()->str:
        name_int = 0
        while str(name_int) in generated_names:
            name_int += 1
        name = str(name_int)
        generated_names.add(name)
        return name

This will reliably give the behaviour you're looking for.

(Obviously in this simple example one could just write a generator that increments an int every time and returns it converted to str: but I assume that your initial state might not actually be {'0'}, but rather some messier set with gaps that you want to fill.)

slothrop
  • 3,218
  • 1
  • 18
  • 11
  • Thank you very much! This makes it much clearer why this behavior occurs. Python sure is a weird language though; I wonder specifically why the list and set iterators don't persist after using "in" but the ones for map objects and the like do? – Animated Aug 21 '23 at 19:51
  • @Animated So if you got an explicit iterator and subsequently reused it, for example if you did `it = iter(mylist); print(x in it); [do something else with it]` you'd get the same behaviour you see with the `map` object. – slothrop Aug 21 '23 at 20:12
  • @slothrop that makes so much sense, thanks! – Animated Aug 21 '23 at 20:14