1

Why do I have to .copy() a numpy array (or list, though those don't have .copy() but make for a good minimal example) when I want to get an independend copy of it, while I don't have to do so with a variable? What's the reason for this behaviour?

E.g. setting a = 1 and b = a and then setting b = 2 does not change a, whereas a=([0]), b = a and b[0]=1 also changes a.

Edit:

It appears that this is a very common issue/question that can be asked in various ways, so I couldn't find it. For a quick "I just need to understand what's happening so that I can move on" I guess the answer (and the answers to the other same/similar questions) are good enough. For a better understanding I guess this link provided by @chthonicdaemon in the comments seems like a good starting point for me.

longer Background

In many situations I have to set up a numpy array with some starting values which then gets filled with other values while my program runs. I.e.

logger = np.zeros(5)
#various things happen
logger[0] = 1
#more stuff happens
logger[3] = 1

you get the idea.

But often it is desirable to have the base array preserved in order to be able to compare the results to it, so I just set up the base first and then make copies from it. So I would expect that I simply could set it up like this:

base = np.zeros(5)
logger = base
logger[0] = 1

so that

In [1]:base
Out[1]:array([ 0.,  0.,  0.,  0.,  0.])
In [2]:logger
Out[2]:array([ 1.,  0.,  0.,  0.,  0.])

However with the above array, they stay "connected", so that I get

In [1]:base
Out[1]:array([ 1.,  0.,  0.,  0.,  0.])

I can fix that by explicitly using

logger = base.copy()

but I'm wondering why I have to.

JC_CL
  • 2,346
  • 6
  • 23
  • 36
  • integers, strings and tuples for example are *immutable* they can't be modified after their creation, lists, sets and numpy arrays are *mutable* they can be changed – Chris_Rands May 03 '18 at 08:15
  • [This](https://stackoverflow.com/questions/575196/in-python-why-can-a-function-modify-some-arguments-as-perceived-by-the-caller) is exactly the same issue, just with a function thrown into the mix. The difference is that you two different things with those variables. Changing the value of a *variable* with an assignment like `logger = foo` is something completely different than modifying an *object* with something like `logger[0] = 1`. – Aran-Fey May 03 '18 at 08:16
  • 3
    [This](https://nedbatchelder.com/text/names1.html) is the best explanation of how Python names work that I know of. – chthonicdaemon May 03 '18 at 08:18
  • If you have ever learnt C/C++, you can easily understand this as it is just implemented like pointer. – Sraw May 03 '18 at 08:20
  • `logger` and `base` reference the same object. – hpaulj May 03 '18 at 08:22
  • I guess I begin to see what's happening/why. Rather confusing that `=` has different meanings, depending on what I apply it to. – JC_CL May 03 '18 at 08:24
  • I don't like that dupe. It doesn't address the difference between `foo = bar` and `foo[0] = bar`. This question is asking why two different operations do two different things, not why an assignment doesn't make a copy. – Aran-Fey May 03 '18 at 08:31
  • Also see http://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list-in-python That talks about plain `list`, but it's the same principle. And for a shorter article with cute diagrams that discusses the same stuff as that Ned Batchelder article, but from a slightly different angle, please see [Other languages have "variables", Python has "names"](http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#other-languages-have-variables) – PM 2Ring May 03 '18 at 08:43

1 Answers1

3

Lists (and numpy arrays) are mutuable objects. str and int are immutable. Immutable objects can't be changed once created, so:

a = 1
b = a

Both points to an immutable object of int, which value is 1. When you change b, you assign it a new value:

b = 3

Now b points to a different object, while a pointer hasn't changed.

Now, lists are mutuable. That means, you can change these objects after they are created.

alist = list((1, 2, 3))
newlist = alist

alist == newlist
True

alist now is a reference to a list object. The list object is just a container of references to the actual objects. So when I assign a new variable the value of alist, I just create another variable pointing to the same container object.

Manipulating one object affects both variables, because they refer to the same container object.

What copy does is making a new copy of the list, including a copy of the references. (sometimes refered as deepcopy). This creates a new container object, hence when changing one of the lists - the other one doesn't change.

Chen A.
  • 10,140
  • 3
  • 42
  • 61
  • 1
    "Now, lists are immutable." Shouldn't that be "mutable"? – JC_CL May 03 '18 at 08:22
  • 1
    It doesn't matter if the object is mutable or not. What matters is what you do with that object. If you don't modify a list, you don't need to copy it. The difference lies in `foo = x` vs `foo[0] = x`, that is, modifying the *variable* vs modifying the *object*. – Aran-Fey May 03 '18 at 08:22
  • @JC_CL Thanks for pointing it out, I fixed it – Chen A. May 03 '18 at 08:29
  • @Aran-Fey your last sentence nails it. I couldn't state it better: modifying the variable vs the object. – Chen A. May 03 '18 at 08:30