0

The following code iterates over an object created by

matches = pattern.finditer(text_to_search)

Later I try to view this object as a list, however this list is empty. When I put this request before iterating, I get the correct list. Then the iteration itself stops working and the results are not shown. I think there is something similar to reading a file here, but then you can use the .seek() method to come back to the beginning. I don't know what to do with the calleble iterator, same method doesn't apply here. Sorry for the poor explanation of the problem, I don't know how to describe it better.

import re

text_to_search = '''
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )

coreyms.com

321-555-4321
123.555.1234abc
123*555*1234
800-555-1234
900-555-1234

Mr. Schafer
Mr Smith
Ms Davis
Mrs. Robinson
Mr. T
'''

sentence = 'Start a sentence and then bring it to an end'

pattern = re.compile(r'abc')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)

print(matches)

print(list(matches))

code result:

<re.Match object; span=(1, 4), match='abc'>
<re.Match object; span=(180, 183), match='abc'>
<callable_iterator object at 0x00000269F1DD5390>
[]

!! list is empty !!

code modification for the test:

print(list(matches)) #this line of code was moved up
for match in matches:
    print(match)

print(matches)

#from there

what gives result:

[<re.Match object; span=(1, 4), match='abc'>, <re.Match object; span=(180, 183), match='abc'>]
<callable_iterator object at 0x00000116F35D5390>

!! an iteration is missing in the result !!

  • Any suggestion for a better title for the topic? – DScounterGO Jul 05 '22 at 15:37
  • `list(...)` doesn't "view the object as a list". It *consumes* the iterator to *create* a new list. But you've consumed the iterator with the `for` loop before you ever call `list(matches)`. – chepner Jul 05 '22 at 15:40
  • @DScounterGO Yeah, I have a few suggestions: 1) It's not important that the iterator is callable. 2) *"itering" should be "iterating". 3) It'd be clearer to replace "useless" with "empty". 4) IMHO, you don't need to mention the language since it's evident from the tags. 5) Lastly, the grammar could be improved. So I'd write, "Why is an iterator empty after iterating through?" – wjandrea Jul 05 '22 at 15:55
  • @wjandrea Thanks, title updated. I prefer not to touch the grammar so as not to lose the clarity of the question. English is not my mother tongue. – DScounterGO Jul 05 '22 at 16:07
  • I mean language.. – DScounterGO Jul 06 '22 at 14:20

2 Answers2

2

finditer returns an iterator. By definition, an iterator can be iterated through once, producing all values, and is then exhausted. File-like objects possessing a seek method are an exception; most iterators cannot be reset.

If you need to reuse the iterator many times, convert it to a list or tuple once up-front, storing all the values it produces, then iterate that list/tuple as many times as you like, e.g.:

matches = list(pattern.finditer(text_to_search))  # Eagerly exhaust iterator, converting to list

for match in matches:  # Each use of list for iteration makes new iterator over same data
    print(match)

print(matches)        # Prints the list

print(list(matches))  # Makes an unnecessary copy of the list

In general, Python 3 has moved to favor functions that return views and iterators over functions returning lists; the former can be trivially converted to the latter in the rare cases it's needed (just wrap them in list()), and in most cases, you don't actually need all the values at once, or want to filter or tweak them before storing them, or you want to stop before you've completed iteration, so the iterator approach means fewer large temporaries, shorter delays before processing begins, and the ability to avoid doing the work entirely if you end the loop early.

You can read more about the difference between iterators and iterables here.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
0

As you observed, an iterator works a lot like a file handle, except that there is no concept of being able to seek back to the beginning, because nothing is saved persistently; once the iterator is exhausted, it's gone forever.

If you want to be able to iterate over the matches again, save the list (analogous to saving a file on disk, except it's in memory):

matches = list(pattern.finditer(text_to_search))
for match in matches:
    print(match)

Now you can reuse matches as much as you want, because each item from the original iterator has been saved into the list.

Samwise
  • 68,105
  • 3
  • 30
  • 44