Whatever is associated with a variable name has to be stored in the program's memory somewhere. An easy way to think of this, is that every byte of memory has an index-number. For simplicity's sake, lets imagine a simple computer, these index-numbers go from 0 (the first byte), upwards to however many bytes there are.
Say we have a sequence of 37 bytes, that a human might interpret as some words:
"The Owl and the Pussy-cat went to sea"
The computer is storing them in a contiguous block, starting at some index-position in memory. This index-position is most often called an "address". Obviously this address is absolutely just a number, the byte-number of the memory these letters are residing in.
@12000 The Owl and the Pussy-cat went to sea
So at address 12000 is a T
, at 12001 an h
, 12002 an e
... up to the last a
at 12037.
I am labouring the point here because it's fundamental to every programming language. That 12000 is the "address" of this string. It's also a "reference" to it's location. For most intents and purposes an address
is a pointer
is a reference
. Different languages have differing syntactic handling of these, but essentially they're the same thing - dealing with a block of data at a given number.
Python and Java try to hide this addressing as much as possible, where languages like C
are quite happy to expose pointers for exactly what they are.
The take-away from this, is that an object reference
is the number of where the data is stored in memory. (As is a pointer
.)
Now, most programming languages distinguish between simple types: characters and numbers, and complex types: strings, lists and other compound-types. This is where the reference to an object makes a difference.
So when performing operations on simple types, they are independent, they each have their own memory for storage. Imagine the following sequence in python:
>>> a = 3
>>> b = a
>>> b
3
>>> b = 4
>>> b
4
>>> a
3 # <-- original has not changed
The variables a
and b
do not share the memory where their values are stored. But with a complex type:
>>> s = [ 1, 2, 3 ]
>>> t = s
>>> t
[1, 2, 3]
>>> t[1] = 8
>>> t
[1, 8, 3]
>>> s
[1, 8, 3] # <-- original HAS changed
We assigned t
to be s
, but obviously in this case t
is s
- they share the same memory. Wait, what! Here we have found out that both s
and t
are a reference to the same object - they simply share (point to) the same address in memory.
One place Python differs from other languages is that it considers strings as a simple type, and these are independent, so they behave like numbers:
>>> j = 'Pussycat'
>>> k = j
>>> k
'Pussycat'
>>> k = 'Owl'
>>> j
'Pussycat' # <-- Original has not changed
Whereas in C
strings are definitely handled as complex types, and would behave like the Python list example.
The upshot of all this, is that when objects that are handled by reference are modified, all references-to this object "see" the change. So if the object is passed to a function that modifies it (i.e.: the content of memory holding the data is changed), the change is reflected outside that function too.
But if a simple type is changed, or passed to a function, it is copied to the function, so the changes are not seen in the original.
For example:
def fnA( my_list ):
my_list.append( 'A' )
a_list = [ 'B' ]
fnA( a_list )
print( str( a_list ) )
['B', 'A'] # <-- a_list was changed inside the function
But:
def fnB( number ):
number += 1
x = 3
fnB( x )
print( x )
3 # <-- x was NOT changed inside the function
So keeping in mind that the memory of "objects" that are used by reference is shared by all copies, and memory of simple types is not, it's fairly obvious that the two types operate differently.