15

A few times I accidentally modified the input to a function. Since Python has no constant references, I'm wondering what coding techniques might help me avoid making this mistake too often?

Example:

class Table:
  def __init__(self, fields, raw_data):
    # fields is a dictionary with field names as keys, and their types as value 
    # sometimes, we want to delete some of the elements 
    for field_name, data_type in fields.items():
      if some_condition(field_name, raw_data):
        del fields[field_name]
    # ...


# in another module

# fields is already initialized here to some dictionary
table1 = Table(fields, raw_data1) # fields is corrupted by Table's __init__
table2 = Table(fields, raw_data2)

Of course the fix is to make a copy of the parameter before I change it:

  def __init__(self, fields, raw_data):
    fields = copy.copy(fields)
    # but copy.copy is safer and more generally applicable than .copy 
    # ...

But it's so easy to forget.

I'm half thinking to make a copy of each argument at the beginning of every function, unless the argument potentially refers to a large data set which may be expensive to copy or unless the argument is intended to be modified. That would nearly eliminate the problem, but it would result in a significant amount of useless code at the start of each function. In addition, it would essentially override Python's approach of passing parameters by reference, which presumably was done for a reason.

max
  • 49,282
  • 56
  • 208
  • 355
  • 1
    Apart from always using immutable types? –  Sep 11 '11 at 17:39
  • 2
    If I need a dictionary, I doubt using a frozen dict (that I have to implement myself) would be worth it? – max Jan 16 '12 at 21:12

4 Answers4

14

First general rule: don't modify containers: create new ones.

So don't modify your incoming dictionary, create a new dictionary with a subset of the keys.

self.fields = dict( key, value for key, value in fields.items()
                     if accept_key(key, data) )

Such methods are typically slightly more efficient then going through and deleting the bad elements anyways. More generally, its often easier to avoid modifying objects and instead create new ones.

Second general rule: don't modify containers after passing them off.

You can't generally assume that containers to which you have passed data have made their own copies. As result, don't try to modify the containers you've given them. Any modifications should be done before handing off the data. Once you've passed the container to somebody else you are no longer the sole master of it.

Third general rule: don't modify containers you didn't create.

If you get passed some sort of container, you do not know who else might be using the container. So don't modify it. Either use the unmodified version or invoke rule1, creating a new container with the desired changes.

Fourth general rule: (stolen from Ethan Furman)

Some functions are supposed to modify the list. That is their job. If this is the case make that apparent in the function name (such as the list methods append and extend).

Putting it all together:

A piece of code should only modify a container when it is the only piece of code with access to that container.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • Thanks. Though not modifying objects after passing them on, that's a bit too extreme, isn't it? Since I have to allow the chance that any function would modify them, it means I can't pass object `x` to any other functions after I passed it to one function? – max Jan 16 '12 at 21:14
  • @max, added third general rule: the functions shouldn't be modifying `x`. The general idea is that its only really safe to modify a container when only one piece of code is using it. – Winston Ewert Jan 16 '12 at 22:34
  • @max You may be interested looking into functional programming - they get by without modifying existing objects at all (most of the time). That programming style has some extremely useful advantages. – Voo Jan 16 '12 at 22:42
  • 3
    Fourth general rule: if your function *does* modify the container passed in, make that apparant in the function name (such as the `list` methods `append` and `extend`). – Ethan Furman Jan 17 '12 at 17:18
4

Making copies of parameters 'just-in-case' is a bad idea: you end up paying for it in lousy performance; or you have to keep track of the sizes of your arguments instead.

Better to get a good understanding of objects and names and how Python deals with them. A good start being this article.

The importart point being that

def modi_list(alist):
    alist.append(4)

some_list = [1, 2, 3]
modi_list(some_list)
print(some_list)

has exactly the same affect as

some_list = [1, 2, 3]
same_list = some_list
same_list.append(4)
print(some_list)

because in the function call no copying of arguments is taking place, no creating pointers is taking place... what is taking place is Python saying alist = some_list and then executing the code in the function modi_list(). In other words, Python is binding (or assigning) another name to the same object.

Finally, when you do have a function that is going to modify its arguments, and you don't want those changes visible outside the function, you can usually just do a shallow copy:

def dont_modi_list(alist):
    alist = alist[:]  # make a shallow copy
    alist.append(4)

Now some_list and alist are two different list objects that happen to contain the same objects -- so if you are just messing with the list object (inserting, deleting, rearranging) then you are fine, buf if you are going to go even deeper and cause modifications to the objects in the list then you will need to do a deepcopy(). But it's up to you to keep track of such things and code appropriately.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • That sounds like the only realistic outcome – max Jan 16 '12 at 23:11
  • The article you have linked is down - the latest available snapshot of it is from Nov 11. 2020. Here you go - I can't edit the answer. https://web.archive.org/web/20201111195827/http://effbot.org/zone/call-by-object.htm – Lou Sep 26 '22 at 10:48
1

You can use a metaclass as follows:

import copy, new
class MakeACopyOfConstructorArguments(type):

    def __new__(cls, name, bases, dct):
        rv = type.__new__(cls, name, bases, dct)

        old_init = dct.get("__init__")
        if old_init is not None:
            cls.__old_init = old_init
            def new_init(self, *a, **kw):
                a = copy.deepcopy(a)
                kw = copy.deepcopy(kw)
                cls.__old_init(self, *a, **kw)

        rv.__init__ = new.instancemethod(new_init, rv, cls)
        return rv

class Test(object):
    __metaclass__ = MakeACopyOfConstructorArguments

    def __init__(self, li):
        li[0]=3
        print li


li = range(3)
print li
t = Test(li)
print li
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
1

There is a best practice for this, in Python, and is called unit testing.

The main point here is that dynamic languages allows for much rapid development even with full unit tests; and unit tests are a much tougher safety net than static typing.
As writes Martin Fowler:

The general argument for static types is that it catches bugs that are otherwise hard to find. But I discovered that in the presence of SelfTestingCode, most bugs that static types would have were found just as easily by the tests.

rob
  • 36,896
  • 2
  • 55
  • 65
  • 3
    as much as I love unit testing, I don't see how it helps especially much here. – Winston Ewert Jan 16 '12 at 23:57
  • 2
    Maybe. I suspect a unit test would probably miss this bug because it only happens when you pass the same container to multiple places. You'd have to do integration tests. Its a great answer for the question of how to avoid bugs in dynamically typed languages. It seems out of place here, as it doesn't reference the kind of bugs the OP asks about. – Winston Ewert Jan 17 '12 at 14:24
  • The OP knows he makes this mistake often, so having a unit test check the container *after* the function could reveal the bug -- however, if no changes occured in the test, it would also miss it. – Ethan Furman Jan 17 '12 at 17:24
  • 1
    By contract, the function described by OP should not change the parameters. Now, you may enforce this contract in several ways, among them, static typing which is not possible with Python. You may then try to overcome this limitation, or you may go to the root: ensuring the contract correctness. With dynamic languages, a very reasonable way to do it is by mean of unit testing, with a test to explicitly check the state of the container after the function call. – rob Jan 17 '12 at 21:01