9

There is this code:

# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!

# assignment behaviour for class object
class Klasa:
    def __init__(self, num):
        self.num = num

a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!

Questions:

  1. Why assignment operator works differently for fundamental type and class object (for fundamental types it copies by value, for class object it copies by reference)?
  2. How to copy class objects only by value?
  3. How to make references for fundamental types like in C++ int& b = a?
scdmb
  • 15,091
  • 21
  • 85
  • 128
  • 2
    Outside Python language, the terms by reference / by value are already confused and confusing. Inside Python, whose data model and execution model are so special, these terms are still more confused and confusing and should be avoided. That's my opinion, but note I'm not a Python expert. See (http://stackoverflow.com/a/986145/551449) and plenty of others posts and blogs on this subject. It seems you need to study a little more the data and execution models of Python – eyquem Dec 11 '11 at 13:00

4 Answers4

14

This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.

Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.

Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.

Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.

Now, for the answers to your questions.

  1. The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
  2. If you want copies of objects, use the copy module.
  3. As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169
  • I think that in the expression "pass by reference", word "reference" designates a number==a memory adress. But in the majority of your text, you take the word "reference" with the meaning of "a piece of memory that holds a number that is the adress of a memory location", that is to say "reference" then employed as a synonym of "pointer". As "pointer" and "reference" are words having floating significance according to the language considered, and Python has special data and execution models, there's a smell of confusing ambiguity in your text, as in the wide majority of texts on this subject – eyquem Dec 11 '11 at 13:25
  • When you write ``a``, do you represent the object (a structure of bits in the memory), the reference (chunk of memory acting as a box) to the object, or the identifier ? – eyquem Dec 11 '11 at 13:28
  • eyquem. I agree. I should probably define terms before I talk about this since it is a potentially confusing subject. – Noufal Ibrahim Dec 11 '11 at 14:03
  • `a` is a variable. It's a programmatic way of referring to a python object. When I say `a`, it *means* the object. A single object can have multiple names so it's quite possible for me to say `b` and still mean the same thing. – Noufal Ibrahim Dec 11 '11 at 14:10
  • A reference is a reference. Implementation details are implementation details. Python variables are references, in that they refer to values, in the ordinary English sense of the word "refer". When you call a function, the values are passed by reference, in the sense that the arguments to the function refer to the same values that the parameters at the call site do. The main problem I see with the wording here is that "point to" ought to read "refer to", for consistency. – Karl Knechtel Dec 11 '11 at 16:29
6

Quoting from Data Model

Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer,” code is also represented by objects.)

From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.

All objects are copied by reference. For ex

>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>

The only way to have a new instance is by creating one.

>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False

This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.

Often copying by reference gives unexpected results for ex.

x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.

The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.

To extend your example

>>> class Klasa:
    def __init__(self, num):
         self.num = num


>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2  
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • +1 for the references to the documentation. – Noufal Ibrahim Dec 11 '11 at 12:38
  • @Abhijit When you write _"copied by reference"_ for the example ``x=20`` then ``y=x``, what is copied ? Personnaly, I think that there is strictly nothing that is copied and that's why the use of "copy by" is senseless in Python in some occasions, maybe all. – eyquem Dec 11 '11 at 13:09
  • The internal data (a pointer, specifically a PyObject*, in the C implementation) that is used to cause variables to refer to values, is copied. :) – Karl Knechtel Dec 11 '11 at 16:31
1

It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.

Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)

1

Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.

Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.

In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.

The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)

You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153