Does ' re.finditer() ' store the values it finds in the memory?

Question

import re

txt = 'programming is a beautiful world'

memory =  re.finditer('beautiful', txt)

print(memory)

result :

<callable_iterator object at 0x0000021DB7C1B5B0>

I'm wondering if ' re.finditer() ' store the values it finds in the memory because when i loop over it i see some kind of a memory address i'm really confused or does it use classes ?

tripleee · Accepted Answer · 2022-09-02T18:31:08.553

3

No, the whole point of an iterator is that it returns one match at a time. The iterator is useful when you want to process one match at a time in isolation, precisely because it avoids returning more than you actually consume. If you bail out after processing the first match, the iterator will never even reach the point of searching for the second.

If you store all the matches in a list in the caller, probably simply use findall instead of finditer, as you are foregoing the possible benefits.

The memory address which Python returns is just the address of the iterator object. Every object in Python has an address in memory, though some are less parsimonious about what exactly is being stored. This is just Python's default repr of internal objects; it includes the memory address mainly as a unique identifer, so that a human reader can tell whether you printed the same object twice, or two different objects of the same type.

Storing the iterator in a variable is sometimes useful, though in this case, it looks like you expected it to contain the matches, not the iterator. This is a common beginner error with iterators and generators.

Understanding the yield keyword should give you a basic grip on generators, and thus iterators. Perhaps review What does the "yield" keyword do?

In so many words, Python keeps track of the state of the iterator between calls. The next time you call it, it recalls where it stopped searching the previous time, and continues searching from there.

You could do the same thing without yield, too. Here is a simple Python example which returns a function which returns an item at a time from a list.

def stupid_iterator(seq):
    """Make an iterator for seq"""
    def _iter():
        return seq.pop(0)
    return _iter
 
i = stupid_iterator([1, 2, 3])
print(i) # for fun; notice memory address in repr
print(i())
print(i())
print(i())
print(i())  # fails: list is empty

Demo: https://ideone.com/qc5XfQ

edited Sep 02 '22 at 18:31

answered Sep 02 '22 at 12:43

tripleee

175,061
34
275
318

*"The next time you call it"* - Call what? (Not the iterator. Irritatingly, the "callable_iterator" isn't callable.) – Kelly Bundy Sep 02 '22 at 13:08
Well, call its `__next__` method, but I wanted to keep this exposition simple. – tripleee Sep 02 '22 at 13:16
actually this is : ' No, the whole point of an iterator is that it returns one match at a time. ' what i wanted to get an answer of – Sep 02 '22 at 15:51
i still didn't get the right answer of my question that why i haven't accepted it yet – Sep 02 '22 at 16:47
So which part of your question do you feel is still unanswered? There are no "classes" here as such, though an iterator is an `object`, in the sense that everything in Python is an object. – tripleee Sep 02 '22 at 17:02
The implementation in https://github.com/python/cpython/blob/3.10/Lib/re.py vaguely exposes some internal C structures as Python objects, but IMHO it would not be fair to say that this is a "class" in any meaningful sense. In the end the real work happens in [the _sre module](https://github.com/python/cpython/blob/8f0fa4bd10aba723aff988720cd26b93be99bc12/Modules/_sre.c#L890) which is implemented in C. – tripleee Sep 02 '22 at 17:27
i mean how does this iterator work in the background and how is it able to return such an address ? – Sep 02 '22 at 17:31
The address is just Python's internal representation of the iterator. It includes the memory address merely to help a human reader tell the difference if you printed two of them and want to see if they refer to the same object. In terms of internal implementation, the method contains a few internal variables which lets it know where it needs to continue. The `yield` keyword makes this easy to do from Python, but it's possible to do with a simple data structure (which in a Python implementation would probably be a class or a closure, but obviously in C just a `struct`). – tripleee Sep 02 '22 at 17:36
so you mean everything happend behind the scene is built in c ? and no python classes used in re.finditer() ? – Sep 02 '22 at 17:41
If you really want to dig into it, you can explore those implementation details via the links in the previous comment. But in very brief, yeah, in Python 3.10, the Python wrapper is simple and mainly does some charades to look more object-oriented than it really is. See also the latest update to the answer. – tripleee Sep 02 '22 at 18:08

Does ' re.finditer() ' store the values it finds in the memory?

1 Answers1