1

I have a function that I want to use to manipulate an input parameter, however, in the process I must copy the input parameter. I do not want to return the modified object (because multiprocessing may or may not be involved). How can I reassign the input parameter to the copied value? Here's a representation of the function:

from copy import deepcopy, copy

def myFunc(myObj,i):
    myObj.x = 'start'
    # I can't avoid this step:
    modObj = deepcopy(myObj)
    modObj.x = 'modified'
    if i == 0:
        myObj.x = modObj.x
    elif i == 1:
        myObj = modObj
    elif i == 2:
        myObj = copy(modObj)
    elif i == 3:
        myObj.__dict__ = modObj.__dict__

class MyClass(object):
   def __init__(self):
       self.x = "construct"

for i in range(4):
   temp = MyClass()
   myFunc(temp,i)
   print i,temp.x

And here's the output:

0 modified
1 start
2 start
3 modified

Since option #3 is so close to what I want to do, is there an "official" way to do it? E.g.

myObj.shallowcopy(modObj)

where this shallowcopy behaves similarly to the shallow copy in copy. One alternative would be to embed MyClass as a member of some other class, and pass that class to the function... but if I can do that, why can't python just save me that step by copying over member data? I've read this, but they didn't seem to come up with an answer for me.

Community
  • 1
  • 1
amos
  • 5,092
  • 4
  • 34
  • 43
  • 1
    possible duplicate of [Python: How do I pass a variable by reference?](http://stackoverflow.com/questions/986006/python-how-do-i-pass-a-variable-by-reference) – Devin Jeanpierre Jun 23 '11 at 03:04
  • The first answer to that question should also answer yours. – Devin Jeanpierre Jun 23 '11 at 03:07
  • 1
    Why do you need to copy the object? How does the possibility of multiprocessing prevent you from returning the modified copy? – Karl Knechtel Jun 23 '11 at 03:48
  • @Devin, I would consider this a duplicate _except_ for the added `multiprocessing` wrinkle. I would recommend that amos edit the title, though, to incorporate that. – senderle Jun 23 '11 at 03:59
  • 1
    Like Karl, I wonder why you need a copy of the object. I don't know if this is what he's thinking, but my question is: Why can't you just modify the *original* object? Are those four options ALL essential features of your function, or are they options in the sense of "here are four ways I've tried to implement the ONE feature my function is trying to do"? – John Y Jun 23 '11 at 04:09
  • is it duplicate to see if any new possibilities have opened up in the intervening 2 years? besides, as I tried to indicate with my `myObj.__dict__ = modObj.__dict__` line, it just **feels** like there should be a built in way to do this... – amos Jun 23 '11 at 18:33

1 Answers1

1

Update: Ok, now that I understand more about the problem, I would suggest a combination of options 1 and 4: create public update() and dictify() methods. These could be wrappers around option 4 or some variation like myObj.__dict__.update(modObj.__dict__) -- both of which are somewhat hackish.

Or, they could be elegant, readable, self-documenting methods that accept and return (respectively) the appropriate dictionary representation of self, which may or may not coincide with self.__dict__. I think the reason Python doesn't have any built-in functionality for things like this is that there's no truly general way to implement it, because not all objects have a __dict__. This kind of behavior has to have a customized implementation.

If manipulating __dict__ is sufficient for your needs, fine, but you save yourself some trouble if you hide that manipulation behind an abstraction that can be altered later, if you decide you want to implement __slots__ or something like that.

(Leaving the below for historical purposes)


This confuses me: "I do not want to return the modified object (because multiprocessing may or may not be involved)." If you mean the multiprocessing module, then even options 0 and 3 will fail, because processes don't share memory.

>>> for i in range(4):
...     temp = MyClass()
...     p = Process(target=myFunc, args=(temp, i))
...     p.start()
...     p.join()
...     print i,temp.x
... 
0 construct
1 construct
2 construct
3 construct

See? If you want to transfer data between processes, you have to use one of multiprocessing's built-in types.

>>> from multiprocessing import Process, Queue
>>> 
>>> def myFuncQ(myObj, i, return_queue):
...     myObj.x = 'start'
...     modObj = deepcopy(myObj)
...     modObj.x = 'modified'
...     return_queue.put(modObj)
... 
>>> return_queue = Queue()
>>> for i in range(4):
...     temp = MyClass()
...     p = Process(target=myFuncQ, args=(temp, i, return_queue))
...     p.start()
...     p.join()
...     temp = return_queue.get()
...     print i,temp.x
... 
0 modified
1 modified
2 modified
3 modified

If you aren't using multiprocessing, then a Queue will serve as well as any other mutable container for this purpose. That's the closest there is to an "official" way to pass by reference in Python.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • Sorry, i should have clarified this point. i set up this function to be compatible with or without multiprocessing, because I wanted to support python 2.5 (for reasons that are no longer true). when I do use multiprocessing, I print my object to file (the standard format for this data structure is a .pdb file), when I don't use multiprocessing, the objects share memory. I think the key issue here is that in my code, `myObj` can't be pickled, because it would be excessively tedious to enable pickling for all the underlying c++ objects. i believe this precludes multiprocessing. – amos Jun 23 '11 at 18:05
  • @amos, Ah -- "underlying c++ objects" == key information that was missing. Well the point remains: you can only do "pass by reference" in python through a container. In fact, the reason `myObj.__dict__ = modObj.__dict__` works is that you are treating `myObj` _as a container_ -- and in an abstraction-breaking way, since not all objects have a `__dict__`. So the fact that it does what you want it to do is more a coincidence than an indication that `myObj.__dict__ = modObj.__dict__` is a good thing to do. – senderle Jun 23 '11 at 19:10
  • You said "This kind of behavior has to have a customized implementation."... but I don't understand that... consider `a=b`, which makes a new `a` to share `b`'s pointer. What I want is for the original `a` to share `b`'s pointer. It should be as simple as reassigning a single pointer, right? – amos Jun 24 '11 at 18:12
  • Anyway, I went with something similar to your "historical" answer. My function signature has an output container in addition to input. – amos Jun 24 '11 at 18:18
  • It's as simple as reassigning a single pointer _in the relevant scope_. Say you have `def func(val): val = newval` and `a = 5` and you call `func(a)`; then inside `func`, `a` doesn't exist, period. Only `val` exists, and val disappears once `func` exits. So `a` can't be altered by `func`. The _thing that a points to_ can be altered, and `a` can be altered _in its scope_, (with `def func(val): return newval` and then `a = func(a)`). But to alter `a` itself, you'd need to have a reference to `a`, not just to the thing it points to. – senderle Jun 24 '11 at 18:43
  • And to get a reference to `a` in python, you have to put it in a container and pass the container. In effect, all variables are "single-star" pointers in python (c equivalent: `int *a;`), but you're asking for a "double-star pointer" (c equivalent: `int **a;`), which doesn't exist. – senderle Jun 24 '11 at 18:47