How does Python process lists?

Question

I have a list:

list1 = [1,2,3]

If I use a function to get some data which I want to replace in the original list

new_data = [2,3,4]

Why doesn't

list1 = new_data

mutate the original list? Why does it create a new reference?

list1[:] = new_data

does work, but why doesn't the other expression work?

Because the Python language designer(s) decided that `=` should copy references, not entire objects. Most languages work like that. Is it really so surprising? — Matt Ball, Oct 28 '13 at 04:16
possible duplicate of [Python how to take a list as a parameter and edit its values?](http://stackoverflow.com/questions/19625946/python-how-to-take-a-list-as-a-parameter-and-edit-its-values) — aIKid, Oct 28 '13 at 04:23
Also possible duplicate of [Mutating list in python](http://stackoverflow.com/questions/17131536/mutating-list-in-python) — aIKid, Oct 28 '13 at 04:24

score 6 · Accepted Answer · answered Oct 28 '13 at 04:15

6

Because that's not how Python works. (What language does work like this?)

Python variable names are just that: names. Assigning foo = whatever just makes foo a new name for the object named by whatever. Simple assignment will never mutate an existing object.

answered Oct 28 '13 at 04:15

Eevee

47,412
11
95
127

2

"What language does work like this?": well, C++ has this behavior, for example. if you have a `std::vector list1;` and later assign `list1 = new_data`, then the vector allocated to `list1` gets a copy of the contents of `new_data`. – SingleNegationElimination Oct 28 '13 at 04:17
6

i swear 90% of beginner confusion about Python is caused by insanity in C++ – Eevee Oct 28 '13 at 04:19
1

@minitech http://www.cplusplus.com/reference/vector/vector/operator=/ c++ totally does have this behavior. it allows overloading assignment. – Eevee Oct 28 '13 at 04:25
Okay, I understand that comment now! And if we’re talking possibilities, VB.NET and C# can do it too with `ByRef` and `ref`, respectively. (Though it’s not exactly the same thing.) – Ry- Oct 28 '13 at 04:28
passing by reference would still only mutate the bucket (variable) though, not the list itself. c++ may be unique in blurring the lines between the two concepts. – Eevee Oct 28 '13 at 04:51

score 4 · Answer 2 · answered Oct 28 '13 at 04:19

Python's names are labels on objects, not memory locations. This is very different from C++. One object may have many names, or no names at all if it is contained within some other object (like a list).

Simple assignments don't change objects, they only rebind names. The object formerly bound to list1 doesn't get changed, though it might be garbage collected if that name was the only way it was referenced in your program.

The web page Code like a Pythonista does a great job of explaining this. I'd check if out if you want to understand Python variables better.

score 1 · Answer 3 · answered Oct 28 '13 at 04:16

1

Because that's how assignment works in Python. All assignments change the specified name to refer to some other object than the one they already refer to.

answered Oct 28 '13 at 04:16

kindall

178,883
35
278
309

score 0 · Answer 4 · answered Oct 28 '13 at 04:17

0

Python names are just references to a position in memory. So:

list1 = new_data

is just making both variables refer to the same position in memory.

On the other hand, list1[:] makes a copy of the list1

answered Oct 28 '13 at 04:17

Christian Tapia

33,620
7
56
73

Assigning to `list1[:]` doesn’t make a copy of `list1`. – Ry- Oct 28 '13 at 04:19
I didn't say anything about assigning something to list1[:], just said that list1[:] makes a copy of it. – Christian Tapia Oct 28 '13 at 04:31
But the question is about the difference between assigning to `list1` versus `list1[:]`, so you should maybe mention the difference. – Ry- Oct 28 '13 at 04:33

score 0 · Answer 5 · answered Oct 28 '13 at 15:25

For a C programmer: C variables are pointers, python variables are handles. I think you really want to do something like this, but in python:

// C code
std::vector<int> myvector;

myfunction(std::vector<int> &testvector) {
     if (somearg) {
         testvector.append(4)
     } else {
         int[] myints = {4,15,16};
         testvector = std::vector<int>(myints)
     }
}

Because testvector is a pointer to an object, and you changed the object it points to, the parent will see your change, no matter which path you take.

In python, this would look like this:

list1 = [1,2,3]

def modfunc(mylist):
    if (somearg):
        mylist.append(4)
    else:
        mylist = [1,2,3]

And while the first one will work, the second one won't. In this case, mylist is not a direct pointer to memory; it points to an entry in an object lookup table, which then points to the real object. In the failing case, you change the name mylist to point to a different object, but the parent still has their name pointing to the original object.

In the first case where it works, you actually deference both the name and the object list to get the actual object and manipulate it directly. Both parent and child's names point to this object, so it works.

So, what do you do? Well, in short, you don't need to do this. In C, you often needed references because it limited you to a single return value. Sure, you could use structs, but it just wasn't very convenient to do in C. In python, tuples are a natural part of the language. So let's say you wanted to do something like this in C:

 int sumdefault(std::vector<int> &avector):
     if len(avector) == 0:
         int[] someints = {1,2,3,4,5}
         avector = std::vector<int>(someints);
     return sum(avector)

So, you need the int return value to return the sum. And you also might change avector, so you need to return a reference. Also, returning avector (say, in a pair) can be dangerous, since you are creating a variable on the local stack, and so returning a reference to it really isn't valid, but returning by value could be expensive (and unnecessary) if avector is large, yada yada. In python, you would just return both values:

def sumdefault(mylist=[]):
    if len(mylist) == 0:
        mylist = [1,2,3,4,5]
    return mylist, sum(mylist)

 alist = [2,3,4,5]
 alist, sumalist = sumdefault(alist)

This is (afaik) the proper pythonic way to deal with this pattern. You'll never waste time unnecessarily copying lists - Python always passes around references to things. and Python really doesn't have anything like a 'local' variable in the same way C does: a variable created in a subfunction has a local name, but it's in a global heap, so even though we constructued [1,2,3,4,5] in a subfunction, that memory isn't going to disappear when we return up a function - our local name for it will disappear, but the parent function will now have a name pointing to it, and it will persist as long as some name references it.

How does Python process lists?

5 Answers5