0

I'm quite puzzled by this simple piece of python code:

data      = np.arange(2500,8000,100) 

logdata   = np.zeros((len(data)))+np.nan  
randata   = logdata  

for i in range(len(data)):
    logdata[i] = np.log(data[i])
    randata[i] = np.log(random.randint(2500,8000))

plt.plot(logdata,randata,'bo')

OK, I don't need a for cycle in this specific instance (I'm just making a simple example), but what really confuses me is the role played by the initialisation of randata. I would expect that in virtue of the for cycle, randata would become a totally different array from logdata, but the two arrays turn out to be the same. I see from older discussions that only way to prevent this from happening, is to initialize randata by its own randata=np.zeros((len(data)))+np.nan or to make a copy randata=logdata.copy() but I don't understand why randata is so deeply linked to logdata in virtue of the for cycle.

If I were to give the following commands

logdata = np.zeros((len(data)))+np.nan 
randata   = logdata 
logdata = np.array([1,2,3]) 
print(randata)

then randata would still be an array of nan, differently from logdata. Why is so?

  • Does this answer your question? [List changes unexpectedly after assignment. Why is this and how can I prevent it?](https://stackoverflow.com/questions/2612802/list-changes-unexpectedly-after-assignment-why-is-this-and-how-can-i-prevent-it) – mkrieger1 Jan 25 '22 at 00:55
  • 1
    `randata` *is* `logdata`, that's why they are "deeply linked". – mkrieger1 Jan 25 '22 at 00:56
  • Most, perhaps all, languages that care about performance (even Python, cough cough) treat `randata = logdata` as an aliasing/referential operation, not a deep copy of an arbitrarily large data structure. Also, worth understanding [Python Names and Values](https://nedbatchelder.com/text/names1.html) before you go too far. – jarmod Jan 25 '22 at 00:58
  • Also https://stackoverflow.com/questions/13530998/are-python-variables-pointers-or-else-what-are-they for your second, kind of "inverse" question. – mkrieger1 Jan 25 '22 at 01:00

1 Answers1

0

Blckknght explains numpy assignment behavior in this post: Numpy array assignment with copy

B = A

This binds a new name B to the existing object already named A. Afterwards they refer to the same object, so if you modify one in place, you'll see the change through the other one too.

But to answer why they're "deeply linked" (or rather, point to the same location in memory), it's mostly because copying large arrays is computationally expensive. So in numpy, the assignment = operator references the same block of memory instead of creating a copy at every assignment. If different arrays are desired, we can allocate new memory explicitly using the copy() method. This gives us the efficiency of C/C++ (where avoiding copies is very common by passing around pointers and references) along with the ease-of-use of python (where pointers and references are not available).

I'd say this is a feature, not a bug.