Let's consider a file:
$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt
This is a foo bar sentence .
And this is the first txtfile in the corpus .
And when I want to read the file by character, I can do https://stackoverflow.com/a/25071590/610569:
>>> fin = open('test.txt')
>>> while fin.read(1):
... fin.seek(-1,1)
... print fin.read(1),
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .
But using while loop might look a little unpythonic esp. when i use fin.read(1)
to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:
>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
... print c,
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .
But when I tried it without the second argument, it throws a TypeError
:
>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
... print c,
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable
What is the 2nd argument in iter
? The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter
As per the doc:
Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
I guess the docs require some "decrypting":
- Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)
Does that mean it needs to come from collections
? Or is that as long as the object has an __iter__()
, that's okay?
- , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)
That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?
- If it does not support either of those protocols, TypeError is raised.
Yes, this part, I do understand =)
- If the second argument, sentinel, is given, then object must be a callable object.
Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a sentinel
? What does sentinel
mean Pythonically? And "callable object" like it's a function and not type object?
- The iterator created in this case will call object with no arguments for each call to its next() method;
This part i don't really get it, maybe an example would help.
- if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
Okay, so sentinel
here refers to some breaking criteria?
Can someone help to decipher/clarify the meaning of the above points about iter
?