6

Many Python builtin "functions" are actually classes, although they also have a straightforward function implementation. Even very simple ones, such as itertools.repeat. What is the motivation for this? It seems like over-engineering to me.

Edit: I am not asking about the purpose of itertools.repeat or any other particular function. It was just an example of a very simple function with a very simple possible impementation:

def repeat(x):
    while True: yield x

But itertools.repeat is not actually a function, it's implemented as a class. My question is: Why? It seems like unnecessary overhead.

Also I understand that classes are callable functions and how you can emulate a function-like behavior using a class. But I don't understand why it's so widely used through the standard library.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
user1747134
  • 2,374
  • 1
  • 19
  • 26

3 Answers3

5

Implementing as a class for itertools has some advantages that generator functions don't have. For example:

  1. CPython implements these built-ins at the C layer, and at the C layer, a generator "function" is best implemented as a class implementing __next__ that preserves state as instance attributes; yield based generators are a Python layer nicety, and really, they're just an instance of the generator class (so they're actually still class instances, like everything else in Python)
  2. Generators aren't pickleable or copyable, and don't have "story" for making them support either behavior (the internal state is too complex and opaque to generalize it); a class can define __reduce__/__copy__/__deepcopy__ (and if it's a Python level class, it probably doesn't even need to do that; it will work automatically) and make the instances pickleable/copyable (so if you have already generated 5 elements from a range iterator, you can copy or pickle/unpickle it, and get an iterator the same distance along in iteration)

For non-generator tools, the reasons are usually similar. Classes can be given state and customized behaviors that a function can't. They can be inherited from (if that's desired, but C layer classes can prohibit subclassing if they're "logically" functions).

It's also useful for dynamic instance creation; if you have an instance of an unknown class but a known prototype (say, the sequence constructors that take an iterable, or chain or whatever), and you want to convert some other type to that class, you can do type(unknown)(constructorarg); if it's a generator, type(unknown) is useless, you can't use it to make more of itself because you can't introspect to figure out where it came from (not in reasonable ways).

And beyond that, even if you never use the features for programming logic, what would you rather see in the interactive interpreter or doing print debugging of type(myiter), <class 'generator'> that gives no hints as to origin, or <class 'itertools.repeat'> that tells you exactly what you have and where it came from?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
2

Both functions and classes are callables, so they can be used interchangeably in higher-order functions, for example.

$ python2
... 
>>> map(dict, [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]
>>> map(lambda x: dict(x), [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]

That said, classes can also define methods that you can later call on the returned objects. For instance, the dict class defines the .get() method for dictionaries, etc.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • 1
    Which makes the question even more puzzling: why is 'itertools.repeat' implemented as a class when it isn't used for instantiating objects? – strubbly Oct 21 '16 at 10:41
  • Sorry - to clarify my comment - I think you need to expand the answer so that a less expert reader can understand that it makes sense in the given example because the `itertools.repeat` class returns an `itertools.repeat` object when called. – strubbly Oct 21 '16 at 11:03
  • @strubbly [itertools.repeat](https://docs.python.org/2.7/library/itertools.html#itertools.repeat) returns an iterator, i.e. an instance of a class that supports the iteration protocol. – Eugene Yarmash Oct 21 '16 at 11:20
  • Well yes but what is surprising is that it is an object of '' rather than an object of, for example, ''. – strubbly Oct 21 '16 at 12:33
2

In the case of itertools.repeat (and most iterators), using a proper class implementing the iterator protocol has a few advantages from the implementation / maintenance POV - like you can have better control of the iteration, you can specialize the class etc. I also suspect that there are some optimisations that can be done at C-level for proper iterators that don't apply to generators.

Also remember that classes and functions are objects too - the def statement is mostly syntactic sugar for creating a function instance and populating it with compiled code, local namespace, cells, closures and whatnots (a somehow involved task FWIW, I did once just for out of curiousity and it was a major PITA), and the class statement is also syntactic sugar for creating a new type instance (doing it manually happens to be really trivial actually). From this POV, yield is a similar syntactic sugar that turns your function into a factory returning instances of the generic generator builtin type - IOW it makes your function act like a class, without the hassle of writing a full-blown class but also without the fine control and possible optimisations you can get by writing a full-blown class.

On a more general leval, sometimes writing your "function" as a custom callable type instead offers similar gains - fine control, possible optimisations, and well sometimes just better readability (think of two-steps decorators, custom descriptors etc).

Finally wrt/ builtin types (int, str etc) IIRC (please someone correct me if i'm wrong) they originally were functions acting as factory functions (before the new-style classes revolution when builtin types and user-defined types were different kind of objects). It of course makes sense to have them as plain classes now, but they had to keep the all_lower naming scheme for compatibility.

bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
  • 1
    I think you mean that a `class` statement is syntactic sugar that creates a `type` object. There are no `class` objects, just `type`s. – Blckknght Oct 21 '16 at 11:28