0

In the following code, I intended to make a list starting from an empty one by appending (random) numpy arrays. For a temporary variable, I initialized a numpy array variable 'sample_pt' which worked as a temporary variable to save a (random) numpy array. While I expected to have a list of random numpy arrays, the output was a list filled with the same (final) numpy array. I suspect that calling a numpy array by its "variable name" returns its memory address. Am I on the right direction, or are there anything that would be good to know?

[Code]

import numpy as np

sample_pt=np.array([0.]) # initial point
sample_list=[]
number_iter=3

for _ in range(number_iter):
    sample_pt[0]=np.random.randn()
    sample_list.append(sample_pt)
    print(sample_list)

[Output]

[array([-0.78614157])]
[array([0.7172035]), array([0.7172035])]
[array([0.47565398]), array([0.47565398]), array([0.47565398])]
  • The point of the duplicate is that it's dangerous to `append` to a list the same mutable object. If using list append, create a new object each time. – hpaulj Jun 19 '18 at 01:44

1 Answers1

0

I don't know what you mean by "call values", or "rather than memory address", or… most of the text of your question.

But the problem is pretty simple. You're appending the same array over and over, instead of creating new ones.

If you want to create a new array, you have to do that explicitly. Which is trivial to do; just move the np.array constructor into the loop, like this:

sample_list=[]
number_iter=3

for _ in range(number_iter):
    sample_pt=np.array([0.]) # initial point
    sample_pt[0]=np.random.randn()
    sample_list.append(sample_pt)
    print(sample_list)

But this can be dramatically simplified.

First, instead of creating an array of 1 zero and then replacing that zero, why not just create an array of the element you want?

sample_pt = np.array([np.random.randn()])

Or, even better, why not just let np.random build the array for you?

sample_pt = np.random.randn(1)

At which point you could replace the whole thing with a list comprehension:

number_iter = 3
sample_list = [np.random.randn(1) for _ in range(number_iter)]

Or, even better, why not make a 3x1 array instead of a list of 3 single-element arrays?

number_iter = 3
sample_array = np.random.randn((number_iter, 1))

If you really need to change that into a list of 3 arrays for some reason, you can always call list on it later:

sample_list = list(sample_array)

… or right at the start:

sample_list = list(np.random.randn((number_iter, 1)))

Meanwhile, I think you misunderstand how values and variables work in Python.

First, forget about "memory address" for a second:

  • An object is a value, with a type, somewhere in the heap. You don't care where.
  • Variables don't have memory addresses, or types; they're just names in some namespace (globals, locals, attributes of some instance, etc.) that refer to some value somewhere.

Notice that this is very different from, say, C++, where variables are typed memory locations, and objects live in those memory locations. This means there's no "copy constructor" or "assignment operator" or anything like that in Python. When you write a = b, all that means is that a is now another name for the same value as b. If you want a copy, you have to explicitly ask for a copy.

Now, if you look at how CPython implements things under the hood:

  • The CPython interpreter represents all objects as pointers to PyObject structs, which are always allocated on the heap.
  • Variables are just string keys in a dict, owned by the module (for globals), an instance (for attributes), or whatever. The values in the dict are just objects like any other. Which means that, under the covers, what's actually stored in the hash table is pointers to string objects for the variable names in the keys, and pointers to whatever value you've assigned in the values.
  • There is a special optimization for locals, involving an array of object pointers stored on the frame, but you usually don't have to worry about that.
  • There's another special trick for closure captures, involving pointers to cell objects that hold pointers to the actual objects, which you have to worry about even less often.

As you can see, thinking about the pointers is harder to understand, and potentially misleading, unless you really care about how CPython works under the covers.

abarnert
  • 354,177
  • 51
  • 601
  • 671