9

In Python, when subclassing tuple, the __new__ function is called with self as an argument. For example, here is a paraphrased version of PySpark's Row class:

class Row(tuple):
    def __new__(self, args):
        return tuple.__new__(self, args)

But help(tuple) shows no self argument to __new__:

  __new__(*args, **kwargs) from builtins.type
      Create and return a new object.  See help(type) for accurate signature.

and help(type) just says the same thing:

__new__(*args, **kwargs)
      Create and return a new object.  See help(type) for accurate signature.

So how does self get passed to __new__ in the Row class definition?

  • Is it via *args?
  • Does __new__ have some subtlety where its signature can change with context?
  • Or, is the documentation mistaken?

Is it possible to view the source of tuple.__new__ so I can see the answer for myself?

My question is not a duplicate of this one because in that question, all discussion refers to __new__ methods that explicitly have self or cls as first argument. I'm trying to understand

  1. Why the tuple.__new__ method does not have self or cls as first argument.
  2. How I might go about examining the source code of the tuple class, to see for myself what's really going on.

Follow-up: Moderators closed this old question as a duplicate of this one. But it's not a duplicate. Look at the accepted answer on this question and note how little overlap it has with the answers in the claimed duplicate, in terms of the information provided.

Paul
  • 3,321
  • 1
  • 33
  • 42
  • 1
    The very documentation that you quote says that this is not the full accurate signature. – Daniel Roseman Dec 25 '15 at 15:33
  • I can see that, but help(type) does not provide any more information, just the same signature and the same comment about accurate signature. So I'm still mystified. – Paul Dec 25 '15 at 15:43
  • 1
    First argument of `__new__` is not instance, but class. Thus, it's usually named `cls`, not `self`. Under the hood, `tuple() == tuple.__new__(tuple)`, and `tuple(iterable) == tuple.__new__(tuple, iterable)`. – GingerPlusPlus Dec 25 '15 at 16:06
  • 1
    The [docs](https://docs.python.org/2/reference/datamodel.html#object.__new__) say that `object.__new__(cls[, ...])` is "Called to create a new instance of class `cls`. `__new__()` is a static method (special-cased so you need not declare it as such) _that takes the class of which an instance was requested as its first argument_." (emphasis mine) — The value it returns will become the `self` passed on to other methods (unlike for example `__init__()` which doesn't have a return value). – martineau Dec 25 '15 at 16:23
  • 1
    [How is tuple implemented in CPython?](http://stackoverflow.com/questions/14135542/how-is-tuple-implemented-in-cpython) – GingerPlusPlus Dec 25 '15 at 16:31
  • Piecing these comments together, it appears the answer is that `cls` is being passed via `*args`, since `object.__new__` must always be passed `cls`. Thanks @martineau and @GingerPlusPlus. – Paul Dec 25 '15 at 16:41

1 Answers1

18

The correct signature of tuple.__new__

Functions and types implemented in C often can't be inspected, and their signature always look like that one.

The correct signature of tuple.__new__ is:

__new__(cls[, sequence])

For example:

>>> tuple.__new__(tuple)
()
>>> tuple.__new__(tuple, [1, 2, 3])
(1, 2, 3)

Not surprisingly, this is exactly as calling tuple(), except for the fact that you have to repeat tuple twice.


The first argument of __new__

Note that the first argument of __new__ is always the class, not the instance. In fact, the role of __new__ is to create and return the new instance.

The special method __new__ is a static method.

I'm saying this because in your Row.__new__ I can see self: while the name of the argument is not important (except when using keyword arguments), beware that self will be Row or a subclass of Row, not an instance. The general convention is to name the first argument cls instead of self.


Back to your questions

So how does self get passed to __new__ in the Row class definition?

When you call Row(...), Python automatically calls Row.__new__(Row, ...).

  • Is it via *args?

You can write your Row.__new__ as follows:

class Row(tuple):
    def __new__(*args, **kwargs):
        return tuple.__new__(*args, **kwargs)

This works and there's nothing wrong about it. It's very useful if you don't care about the arguments.

  • Does __new__ have some subtlety where its signature can change with context?

No, the only special thing about __new__ is that it is a static method.

  • Or, is the documentation mistaken?

I'd say that it is incomplete or ambiguous.

  • Why the tuple.__new__ method does not have self or cls as first argument.

It does have, it's just not appearing on help(tuple.__new__), because often that information is not exposed by functions and methods implemented in C.

  • How I might go about examining the source code of the tuple class, to see for myself what's really going on.

The file you are looking for is Objects/tupleobject.c. Specifically, you are interested in the tuple_new() function:

static char *kwlist[] = {"sequence", 0};
/* ... */
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O:tuple", kwlist, &arg))

Here "|O:tuple" means: the function is called "tuple" and it accepts one optional argument (| delimits optional arguments, O stands for a Python object). The optional argument may be set via the keyword argument sequence.


About help(type)

For the reference, you were looking at the documentation of type.__new__, while you should have stopped at the first four lines of help(type):

In the case of __new__() the correct signature is the signature of type():

class type(object)
 |  type(object_or_name, bases, dict)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type

But this is not relevant, as tuple.__new__ has a different signature.


Remember super()!

Last but not least, try to use super() instead of calling tuple.__new__() directly.

Andrea Corbellini
  • 17,339
  • 3
  • 53
  • 69
  • what if I want to redefine its parameter? \n tuple2 inherted from tuple – 喵喵喵 May 19 '17 at 09:46
  • What if I want to redefine its parameters? tuple2 inhert from tuple.It act like: tuple2(2) == (0,0) 、 tuple2(5) == (0,0,0,0,0) however tuple.__new__ accept only iterable rather than int. so __new__ makes all the evaluation and we can do nothing in __init__ – 喵喵喵 May 19 '17 at 09:54
  • _(:з」∠)_don't know how to make a line break – 喵喵喵 May 19 '17 at 09:55
  • @喵喵喵 you can define your own `__new__` like this: `def __new__(cls, size): return super().__new__(cls, (0,) * size)` – Andrea Corbellini May 19 '17 at 12:47
  • @AndreaCorbellini If I write (0,)* size in __new__ . I assigned it an default value. Since tuple is immutable, no further assignment can be made. – 喵喵喵 May 24 '17 at 08:34
  • I mean, if I modify a immutable subclass, all assignments shall be made in '__new__' ? – 喵喵喵 May 24 '17 at 08:49
  • @喵喵喵 as you said, tuples are immutable. If you want something that can be changed, use list or array or define your own class: you're not forced to inherit from builtin types, in fact I would discourage it if you're going to change the semantics drastically – Andrea Corbellini May 24 '17 at 16:27