populating lists -- what is python doing?

Question

i THINK i've solved my own issue already, but i'm seeking a better understanding of why, or to be enlightened/set straight.

i have a list i call vec:

vec = [0.0, 0.0]

which changes values as data are read in.

in order to compare the current and previous values, i have another list which i call oldvec. if i define oldvec as

oldvec = vec

then it changes values every time vec changes values, so a comparison is useless -- they're always the same.

however, if i instead write

oldvec = [vv for vv in vec]

i don't have this problem -- oldvec keeps its values even as vec changes, so the comparison between current and previous vectors works as i need it to, i.e. it actually detects repeats and non-repeats! ... WHY?

Just to add to all the good answers below, you may enjoy reading SO member Ned Batchelder's comprehensive article with cute diagrams, [Facts and myths about Python names and values](http://nedbatchelder.com/text/names.html). — PM 2Ring, Oct 05 '14 at 05:43
possible duplicate of [Python references](http://stackoverflow.com/questions/2797114/python-references) — simonzack, Oct 05 '14 at 06:46
Wow, great! Many thanks for all the excellent info (including the links). — user3903270, Oct 05 '14 at 06:52

score 1 · Answer 1 · answered Oct 05 '14 at 04:38

1

Setting oldvec to vec literally makes oldvec point to vec. You haven't created a new list, you've simply made another name for it. By using a list comprehension you are explicitly creating a new copy of the list, equivalent to vec.copy().

answered Oct 05 '14 at 04:38

xavier

877
6
13

alexwlchan · Accepted Answer · 2014-10-05T12:30:45.450

One way you can see what's going under the hood is to use the id function. This shows you the memory address of an object. The memory address refers to where the object is stored in physical memory.

If we run it for these three commands, and look at the different addresses (this is just on my computer; you'll get different numbers if you run it yourself):

>>> vec = [0.0, 0.0]
>>> print id(vec)
4501729936

>>> oldvec1 = vec
>>> print id(oldvec1)
4501729936

>>> oldvec2 = [vv for vv in vec]
>>> print id(oldvec2)
4502046984

We see that vec and oldvec1 refer to the same address, so they're two different labels for the same object. Under the hood, Python is manipulating the object at address 4501729936: the variable names vec and oldvec1 are just convenient labels for us to use. They don’t refer to “distinct” objects.

By contrast, oldvec2 is somewhere completely different. When Python runs the list comprehension, it doesn’t know that this will happen to produce the same list as before, so it creates a new copy of that list.

Here's a quick n' dirty picture to show what's going on. Although the red blob and the green blog happen to contain the same information, they are two different blobs. Both vec and oldvec1 point to the same red blob, so any operations on either one will affect the underlying red blob, and be reflected in the other. By contrast, oldvec2 points to a completely different green blob, which happens to be a copy of information in the red blob, but changes to the green blob don’t affect the red blob.

enter image description here

score 0 · Answer 3 · edited May 23 '17 at 10:09

In Python, variables are "references" meaning that you can have two variables referring to the same object. In your first example that's what is happening: two names for the same list.

If you need a second actual list, you can "copy" the first one. For how, see here: How to clone or copy a list?

Note that this applies to the items within the list too--if they were more complicated objects, you would have the choice to do a "deep copy" of the list, copying every single part of every element, or a "shallow copy", copying just the references, so you'd have a new list containing new references to the original objects. You need to choose the right approach for each use case.

score 0 · Answer 4 · answered Oct 05 '14 at 04:38

You should think on Python list as an object: something instantiated somewhere in the memory, with one or more pointers storing its memory address - ie, when you say [], you are allocating some new space somewhere in the memory. So, when you call vec = [0.0, 0.0], it creates a new list somewhere in the memory, and it's address is stored in vec variable. So, when you do oldvec = vec, you're simply copying the address from vec to oldvec.

Let me illustrate with an example: imagine, for instance, that your list [0.0, 0.0] is stored at address 0x0800. When you say vec = [0.0, 0.0], vec variable now receives 0x0800. When you say oldvec = vec, oldvec receives the same 0x0800. So, when you access the first element of oldvec, you are indeed accessing the same list that vec points to.

Now, think about your new line: oldvec = [vv for vv in vec]. When you do [], it creates a new list, somewhere else in the memory, right? This list is populated with the elements of vec, as the for command says. So, it creates a new list, somewhere else in the memory, storing 0.0 and 0.0 (and new elements later, if I understood correctly what you explained). That's how internally Python handles your command.

Hope that helps.

populating lists -- what is python doing?

4 Answers4