3

This is a question about the correct terminology used for "generators". Let's look at the file object returned by the builtin function open().

1. The builtin open() function, official documentation

In the official python documentation, then open() function is said to return a "file object" and the documentation for file object does not really say what kind of creature this is, other than it has read() and write() methods and that

File objects are also called file-like objects or streams.

‍♂️ Well that's helpful, right?

2. Words from the internet

Here are some examples where the file object returned by the open() is called a generator.

2.1. How to Use Generators and yield in Python (Realpython.com)

(emphasis mine)

open() returns a generator object that you can lazily iterate through line by line

2.2. Lazy Method for Reading Big File in Python?

(Accepted answer with 400+ score, emphasis mine)

If the file is line-based, the file object is already a lazy generator of lines:

for line in open('really_big_file.dat'):
    process_data(line)

2.3. Generators in Python — 5 Things to Know (medium.com)

(emphasis mine)

using the open() method to open the EEG file will create a file object, which functions as a generator that yields a line of data as string each time.

One can probably find easily more of such examples from everywhere on the Internet..

3. Testing if file object returned by open() is a generator

Following the How to check if an object is a generator object in python? we can form few test for the file object:

In [7]: o = open(r'C:\tmp\test.csv')

In [8]: type(o)
Out[8]: _io.TextIOWrapper

In [9]: import inspect

In [10]: inspect.isgenerator(o)
Out[10]: False

In [12]: inspect.isgeneratorfunction(o)
Out[12]: False

In [13]: import types

In [14]: isinstance(o, types.GeneratorType)
Out[14]: False

All of these tests fail, hinting that the file object returned by open() is not a generator. Still, many people tend to call it a generator.

4. Generators included – or not?

So, fellow pythonistas, is it correctly said that open() function returns a generator? And does the following

for line in open('file.csv'):
    do_something(line)

involve usage of generators?

Niko Föhr
  • 28,336
  • 10
  • 93
  • 96
  • 3
    Since you can `seek` an open file: clearly not. – deceze Jul 13 '20 at 11:03
  • The [source code](https://github.com/python/cpython/blob/942f7a2dea2e95a0fa848329565c0d0288d92e47/Lib/_pyio.py#L2537) for the `__next__` method of the `TextIoWrapper` says that `TextIoWrapper` is subclass of `TextIoBase`, which is subclass of `IOBase`. It [also says](https://github.com/python/cpython/blob/942f7a2dea2e95a0fa848329565c0d0288d92e47/Lib/_pyio.py#L338) that "IOBase object can be iterated over *yielding* the lines in a stream.". There they use the term "yield". Would it say that there is some generator involved when reading the lines using `for line in open(file)`? – Niko Föhr Jul 13 '20 at 11:22
  • In that paragraph it explicitly uses the word *iterator*, not generator. The word “yield” is likely used in the English sense, not in the Python-keyword sense. – deceze Jul 13 '20 at 11:26
  • 1
    You can also do: `l = [1, 2, 3] ; for num in l: ...` That doesn't make the list a generator, it is an **iterable**. Just like the file object. You can use it to *behave* like a generator, but the difference is as said above, you can `seek` back on a file. You **can't** rewind a generator – Tomerikoo Jul 13 '20 at 11:26
  • It makes sense. I checked the source code of the `TextIOWrapper` and it seems that in addition of being an iterable it is also an *iterator* (it has `__iter__` method that returns `self`). – Niko Föhr Jul 13 '20 at 12:05

1 Answers1

1

The python open function returns a TextIOWrapper object which is not a generator.

The reason why you can iterate through the object though is because it defines the __next__ method. You can find the source code for it here, it will help clear things out.

Ahmed Tounsi
  • 1,482
  • 1
  • 14
  • 24
  • Thanks! See my comment on the question; There is a mention about "IOBase object can be iterated over *yielding *the lines in a stream.". Would this say that there is a generator involved when using `for line in open(file): ..`? – Niko Föhr Jul 13 '20 at 11:25
  • @np8 I linked the source code with my answer, the `IOBase` class does not define a `__next__` method. The word yielding in the documentation is referring to subclasses of the `IOBase` class and does not imply a generator involved. – Ahmed Tounsi Jul 13 '20 at 11:27
  • The direct return type of open is a `TextIOWrapper`, it is not a generator. – Ahmed Tounsi Jul 13 '20 at 11:28