25

I was writing some code this afternoon, and stumbled across a bug in my code. I noticed that the default values for one of my newly created objects was carrying over from another object! For example:

class One(object):
    def __init__(self, my_list=[]):
        self.my_list = my_list
        
one1 = One()
print(one1.my_list)
[] # empty list, what you'd expect.

one1.my_list.append('hi')
print(one1.my_list)
['hi'] # list with the new value in it, what you'd expect.

one2 = One()
print(one2.my_list)
['hi'] # Hey! It saved the variable from the other One!

So I know it can be solved by doing this:

class One(object):
    def __init__(self, my_list=None):
        self.my_list = my_list if my_list is not None else []

What I would like to know is... Why? Why are Python classes structured so that the default values are saved across instances of the class?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
TorelTwiddler
  • 5,996
  • 2
  • 32
  • 39
  • Weird, reminds me of a prototype chain in JavaScript – Greg Guida Jul 27 '11 at 00:44
  • 3
    This is an aspect of python functions, not classes. Anyway, [this post](http://stackoverflow.com/q/1132941/287976) can be helpful to make it clear why Python is designed this way. – brandizzi Jul 27 '11 at 01:00
  • 1
    It seems like the last few days I keep seeing new versions of this question... – Karl Knechtel Jul 27 '11 at 04:24
  • 1
    Python **functions (whether methods or plain functions) are objects themselves**. The default argument is bound to the parameter name (and shadowed if a call provides an explicit value); its visibility is the function body. Nothing going on at the class-level at all beyond the fact that a method is a member of the defining class. – Lutz Prechelt Mar 30 '15 at 11:37

6 Answers6

10

This is a known behaviour of the way Python default values work, which is often surprising to the unwary. The empty array object [] is created at the time of definition of the function, rather than at the time it is called.

To fix it, try:

def __init__(self, my_list=None):
    if my_list is None:
        my_list = []
    self.my_list = my_list
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 6
    Note that your solution has a potential bug: If you pass an empty list into your function, intending that the object copy a reference to that list, your `my_list or []` expression will select the *new* empty list `[]` instead of `my_list` (because an empty list is falsy). – Greg Hewgill Jul 27 '11 at 00:46
  • -1: It's not an "issue". It's a matter of definition. – S.Lott Jul 27 '11 at 03:09
  • @S.Lott: Thanks, fixed the terminology. – Greg Hewgill Jul 27 '11 at 03:13
  • 2
    Personally I think `if foo is None: foo = mutable_default` is an antipattern in most cases. Now the function just unexpectedly mutates values explicitly passed in from the outside. Plus you lose the ability to actually pass `None`, which may or may not be meaningful. – Ben Jul 27 '11 at 03:30
  • 2
    @Ben +1, more if I could. I prefer `def func(arg=()): arg = list(arg); proceed()`. Assuming a mutable value is needed in the first place. Consider that we should also let the user pass in a generator, without a compelling reason to forbid it... and typically we will need to copy the data anyway in that case, if we're doing anything other than iterating over it for non-mutating purposes. – Karl Knechtel Jul 27 '11 at 04:27
  • @Ben: -1. Mutable default values can have a purpose for memoization. I haven't seen any examples of "unexpectedly mutates values explicitly passed in from the outside" based on the `if foo is none: foo = []` kind of argument processing. I see "predictable" mutation of argument values. Never "unexpectedly". – S.Lott Jul 27 '11 at 10:00
  • 1
    @S.Lott: I'm not the one saying "never use mutable default arguments". Personally I just use empty lists as my default arguments, and don't mutate arguments to a function at all unless that's part of the documented functionality of the callable. My point is that if you encounter a bug due to a mutable default argument, it's probably a bug in the case where a non-default value is received as well, and `if foo is None: foo = []` does nothing to fix that, and only makes the bug more subtle. – Ben Jul 27 '11 at 12:33
  • @Ben: "it's probably a bug in the case where a non-default value is received as well". I'm hard-pressed to visualize this. I really can't agree that it's even likely. It would have to be a function with very confusing semantics. The idea that there's even some subtlety to this is the real code smell. Nothing to do with mutability. It sounds like it has everything to do with murky function semantics. – S.Lott Jul 27 '11 at 12:44
  • 1
    @S.Lott: Say I have a function that takes a list of strings and writes them to a file, along with some other formatting. I allow the list to default to empty. I discover that calling it multiple times is changing the default, so I apply the `None` default trick. But that was only necessary if the function mutates the list. And if that case, what happens if I pass it a list of strings I want to use again for something else? It's been clobbered. The real problem isn't the default, it's that the function shouldn't be modifying its argument as a side effect of its real purpose. – Ben Jul 27 '11 at 13:29
  • 1
    @S.Lott: And in a lot of real projects, the function is in module A, written by coder A, the caller is in module B written by coder B, and the value originated in module C written by coder C. – Ben Jul 27 '11 at 13:36
  • @Ben: "The real problem isn't the default, it's that the function shouldn't be modifying its argument as a side effect of its real purpose". Correct. That's the real code smell. Great example of what not to do. Nothing to do with the `if foo is none: foo = []`. Everything to do with bad design. Good example of bad design. – S.Lott Jul 27 '11 at 13:55
  • 1
    @Greg Hewgill: Thanks for the suggestion with `my_list or []`. I've changed it in the question. Sorry, but I'm choosing Ben's answer, as it better addresses what he and S.Lott were talking about in the comments here. – TorelTwiddler Jul 27 '11 at 17:19
  • 1
    @S.Lott: Yes, exactly. I just chose to get up on my soap box because I've found that the *need* to use `if foo is `None`: ...` is often an indication of bad design. And everybody loves to help the new Python programmer by explaining that default values don't work that way, and then they go away and stick `if foo is None: ...` everywhere, and think that means its safe to mutate their arguments, and leave little time bombs in their code (which will often never go off, because many calls will use the default, and many more will use values that don't matter after the call, but it's the principle). – Ben Jul 28 '11 at 00:13
  • 1
    @TorelTwiddler: Heh, thanks. Greg's answer is probably a better answer to the question you *asked* (why does this happen). My answer was addressed at an question I think usually *should* be asked along with this question, but often isn't. :) – Ben Jul 28 '11 at 00:15
  • @Ben: the need to use `if foo is None:...` is *never* an indication of bad design. It's *always* the right thing to do. Your examples of bad design are epically bad design. Nothing more. They are not accidents waiting to happen. The number of functions that "suprisingly" mutate a default argument is approximately zero because it's such a bad design. Please stop claiming that it's a universal "problem". "They" are not putting `if foo is None:...` in "their" code randomly and unthinkingly. Please stop claiming "they" are. The claim makes all programmers sound pathetically stupid. – S.Lott Jul 28 '11 at 01:41
  • 1
    @S.Lott: So why is the "mutable default argument trap" such a common source of questions from people learning Python? If people weren't mutating their arguments, they'd never discover that default arguments aren't created fresh for every invocation. You say `if foo is None: ...` is *always* the right thing to do. But it's only needed at all if a function argument with a default value is mutated. I simply claim that it is likely to be incorrect to mutate that argument *whether or not it received the default value on this particular invocation*. – Ben Jul 28 '11 at 02:18
  • @Ben: People aren't mutating their arguments randomly or surprisingly or accidentally. It's a common question because Python's mutability isn't obvious. Mutable arguments are common. Defaults are attractive. Not everyone makes the mistake. Once they see it, they don't *universally* throw random code around. Most seem to figure it out. A few ask here. Then they fix it. Your bad design examples are great examples of bad design. Your further claim (that `if foo is None: foo = mutable` is *universally* bad) is unjustified. Many people use it and understand it. They really do. – S.Lott Jul 28 '11 at 02:28
  • 1
    @S.Lott: I didn't claim it was universally bad. I claimed that being surprised by your default value being mutated is *often* an indication that there is a deeper problem, which is not addressed by `if foo is None: ...`. I also think the cases where arguments have sensible defaults are ones where it is less likely (not impossible) for it to be a great idea to be mutating that argument. So it's not that `None` as a default makes me think the code is bad. It's that someone who is surprised by a mutated default and fixes it by applying a `None` default is likely to still have a potential bug. – Ben Jul 28 '11 at 02:39
  • @Ben: "someone who is surprised by a mutated default and fixes it by applying a None default is likely to still have a potential bug". Correct. And I'm claiming that this situation is vanishingly rare. I've never seen it. You have no examples. However, you've repeated it so many times, that I give up. I've never seen it. However, you've repeated it so many times that I'm forced to agree. Somewhere "they" really are throwing random code at this problem. It must somehow be happening because you claim it is. I've never seen it, but I yield to your exhausting repetition. – S.Lott Jul 28 '11 at 03:06
  • 1
    @S.Lott. This is a question about exactly that issue. We are commenting on an answer that suggests using the `None` default trick to solve it. Another answer to this question links to a relatively well known article which states "There's a Python gotcha that bites everybody as they learn Python. In fact, I think it was Tim Peters who suggested that every programmer get caught by it exactly two times. It is call the *mutable defaults* trap." Karl Knechtel says in a comment on the question "It seems like the last few days I keep seeing new versions of this question." I'm not making this up. – Ben Jul 28 '11 at 03:25
  • @Ben: I've never seen the standard solution become a problem. However, you have posted and reposted that the standard solution creates a problem of some nature. I have no evidence. You have no evidence. All you have is endless repetition of your claim that "they go away and stick `if foo is None: ...` everywhere... and leave little time bombs in their code". You have no evidence. I have no evidence. You have endless repetition backing you up, therefore, you're right. Simply right. Whatever it is that's bad about this solution, I agree. – S.Lott Jul 28 '11 at 09:51
  • 1
    @S.Lott: **By definition** any code that needs a `None` default can modify its arguments, which is bad unless this fact is obvious to callers. That is all I am saying. I have been very clear about that core point, and you have never actually disputed it. But you're right, this is getting repetitive, and I don't think anyone but us is reading this anyway. You are correct that it doesn't cause issues in practice very often. I strongly disagree that that means it should be ignored. I think we'll have to leave it at that. Thank you for the discussion. – Ben Jul 28 '11 at 23:02
4

Several others have pointed out that this is an instance of the "mutable default argument" issue in Python. The basic reason is that the default arguments have to exist "outside" the function in order to be passed into it.

But the real root of this as a problem has nothing to do with default arguments. Any time it would be bad if a mutable default value was modified, you really need to ask yourself: would it be bad if an explicitly provided value was modified? Unless someone is extremely familiar with the guts of your class, the following behaviour would also be very surprising (and therefore lead to bugs):

>>> class One(object):
...     def __init__(self, my_list=[]):
...         self.my_list = my_list
...
>>> alist = ['hello']
>>> one1 = One(alist)
>>> alist.append('world')
>>> one2 = One(alist)
>>> 
>>> print(one1.my_list) # Huh? This isn't what I initialised one1 with!
['hello', 'world']
>>> print(one2.my_list) # At least this one's okay...
['hello', 'world']
>>> del alist[0]
>>> print one2.my_list # What the hell? I just modified a local variable and a class instance somewhere else got changed?
['world']

9 times out of 10, if you discover yourself reaching for the "pattern" of using None as the default value and using if value is None: value = default, you shouldn't be. You should be just not modifying your arguments! Arguments should not be treated as owned by the called code unless it is explicitly documented as taking ownership of them.

In this case (especially because you're initialising a class instance, so the mutable variable is going to live a long time and be used by other methods and potentially other code that retrieves it from the instance) I would do the following:

class One(object):
    def __init__(self, my_list=[])
        self.my_list = list(my_list)

Now you're initialising the data of your class from a list provided as input, rather than taking ownership of a pre-existing list. There's no danger that two separate instances end up sharing the same list, nor that the list is shared with a variable in the caller which the caller may want to continue using. It also has the nice effect that your callers can provide tuples, generators, strings, sets, dictionaries, home-brewed custom iterable classes, etc, and you know you can still count on self.my_list having an append method, because you made it yourself.

There's still a potential problem here, if the elements contained in the list are themselves mutable then the caller and this instance can still accidentally interfere with each other. I find it not to very often be a problem in practice in my code (so I don't automatically take a deep copy of everything), but you have to be aware of it.

Another issue is that if my_list can be very large, the copy can be expensive. There you have to make a trade-off. In that case, maybe it is better to just use the passed-in list after all, and use the if my_list is None: my_list = [] pattern to prevent all default instances sharing the one list. But if you do that you need to make it clear, either in documentation or the name of the class, that callers are relinquishing ownership of the lists they use to initialise the instance. Or, if you really want to be constructing a list solely for the purpose of wrapping up in an instance of One, maybe you should figure out how to encapsulate the creation of the list inside the initialisation of One, rather than constructing it first; after all, it's really part of the instance, not an initialising value. Sometimes this isn't flexible enough though.

And sometimes you really honestly do want to have aliasing going on, and have code communicating by mutating values they both have access to. I think very hard before I commit to such a design, however. And it will surprise others (and you when you come back to the code in X months), so again documentation is your friend!

In my opinion, educating new Python programmers about the "mutable default argument" gotcha is actually (slightly) harmful. We should be asking them "Why are you modifying your arguments?" (and then pointing out the way default arguments work in Python). The very fact of a function having a sensible default argument is often a good indicator that it isn't intended as something that receives ownership of a pre-existing value, so it probably shouldn't be modifying the argument whether or not it got the default value.

Ben
  • 68,572
  • 20
  • 126
  • 174
  • I agree with your advice about object ownership, but you're going to get the sort of behavior you're describing any time you pass references to mutable objects around, and it's pretty much par for the course in any language -- you get used to it. The mutable default trap is insidious because it's unintuitive, and other languages don't do it that way. – Ian McLaird Jul 27 '11 at 04:10
  • But that's just my point. It bites you because you're not careful around the default argument. But if you're mutating a value passed in, then it should almost always be the case that the **purpose** of the function is to mutate the value passed in. In which case it isn't sensible to have a default value. If there's a bug where you accidentally mutate a default value, there's probably a much more subtle bug where you accidentally mutate a passed-in value that somebody cares about. – Ben Jul 27 '11 at 05:01
  • @Ben: I like your answer, but I have a question. The intent of my code is to indeed be a factory function. Is there good practice in making a factory function I should follow? Such as _not_ using `__init__`? – TorelTwiddler Jul 27 '11 at 17:04
  • @TorelTwiddler: I added a section saying what I would do with your `One` class, and other things to think about (there's a trade-off unfortunately). I hope it helps! I also got rid of the remark about the factory function, which was probably a bit confusing. What I was referring to was that maybe if you expect your argument to provide a new value every time, the *argument itself* could be a factory function (with the default value of `lambda: []`). That is rarely what you actually want to do though, hence editing it out of my answer. – Ben Jul 28 '11 at 00:06
  • @Ben: Thank you for elaborating on your answer! After reading your newest edit, I'm confident that there is no significant reason in my code to allow you to pass it a mutable object (who's ownership will be taken over). I will be populating my lists and dictionaries after the initialization of my class instead to completely avoid any issues with altering passed objects. Again, thank you very much for the thorough answer! – TorelTwiddler Jul 28 '11 at 00:27
3

This is standard behavior of default arguments anywhere in Python, not just in classes.
For more explanation, see Mutable defaults for function/method arguments.

phwd
  • 19,975
  • 5
  • 50
  • 78
Ian McLaird
  • 5,507
  • 2
  • 22
  • 31
3

Basically, python function objects store a tuple of default arguments, which is fine for immutable things like integers, but lists and other mutable objects are often modified in-place, resulting in the behavior you observed.

Colin Valliant
  • 1,899
  • 1
  • 13
  • 20
2

Python functions are objects. Default arguments of a function are attributes of that function. So if the default value of an argument is mutable and it's modified inside your function, the changes are reflected in subsequent calls to that function.

Imran
  • 87,203
  • 23
  • 98
  • 131
1

Not an answer, but it's worth noting this is also true for class variables defined outside any class functions.

Example:

>>> class one:
...     myList = []
...
>>>
>>> one1 = one()
>>> one1.myList
[]
>>> one2 = one()
>>> one2.myList.append("Hello Thar!")
>>>
>>> one1.myList
['Hello Thar!']
>>>

Note that not only does the value of myList persist, but every instance of myList points to the same list.

I ran into this bug/feature myself, and spent something like 3 hours trying to figure out what was going on. It's rather challenging to debug when you are getting valid data, but it's not from the local computations, but previous ones.

It's made worse since this is not just a default argument. You can't just put myList in the class definition, it has to be set equal to something, although whatever it is set equal to is only evaluated once.

The solution, at least for me, was to simply create all the class variable inside __init__.

Fake Name
  • 5,556
  • 5
  • 44
  • 66
  • That's kind of the definition of a **class variable**. It is defined in the class, and holds one value for the class. Whereas an **instance variable** is defined in the instance, and holds one value for the instance. -- As for "it has to be set equal to something", **every** Python label it has to be set equal to something. If you want no other value at the moment, set it equal to `None`. -- The "bug" is that you are expecting Python to behave like some other language you have experience with. Python is not that other language, it's Python. – Jerry B Oct 05 '13 at 07:46