4

I tried to understand the difference caused by numpy "2D" arrays, that is, numpy.zeros((3, )), numpy.zeros((3, 1)), numpy.zeros((1, 3)).

I used id to look at the memory allocation for each element. But I found some weird outputs in iPython console.

a = np.zeros((1, 3))
In [174]: id(a[0, 0])
Out[174]: 4491074656

In [175]: id(a[0, 1])
Out[175]: 4491074680

In [176]: id(a[0, 2])
Out[176]: 4491074704

In [177]: id(a[0, 0])
Out[177]: 4491074728

In [178]: id(a[0, 1])
Out[178]: 4491074800

In [179]: id(a)
Out[179]: 4492226688

In [180]: id(a[0, 1])
Out[180]: 4491074752

The memories of the elements are

  1. not consecutive
  2. changing without reassignment

Moreover, the elements in the array of shape (1, 3) seem to be of successive memory at first, but it's not even the case for other shapes, like

In [186]: a = np.zeros((3, ))

In [187]: id(a)
Out[187]: 4490927280

In [188]: id(a[0])
Out[188]: 4491075040

In [189]: id(a[1])
Out[189]: 4491074968
In [191]: a = np.random.rand(4, 1)

In [192]: id(a)
Out[192]: 4491777648

In [193]: id(a[0])
Out[193]: 4491413504

In [194]: id(a[1])
Out[194]: 4479900048

In [195]: id(a[2])
Out[195]: 4491648416

I am actually not quite sure whether id is suitable to check memory in Python. From my knowledge I guess there is no easy way to get the physical address of variables in Python.

Just like C or Java, I expect the elements in such "2D" arrays should be consecutive in memory, which seems not to be true. Besides, the results of id are keeping changing, which really confuses me.

I am interested in this because I am using mpi4py a little bit, and I wanna figure out how the variables are sent/received between CPUs.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
hzfmer
  • 55
  • 1
  • 5
  • 1
    according to [this question](https://stackoverflow.com/questions/39377866/does-numpy-internally-store-size-of-an-array) and [this question](https://stackoverflow.com/questions/40911491/some-confusions-on-how-numpy-array-stored-in-python), numpy does not store array values (?) so when you use `id()`, it creates the array on the fly and that's why "The memories of the elements are changing without reassignment" (?) – AcaNg Jul 30 '19 at 00:54
  • `id` tells you nothing about this. Those 3 arrays use the same underlying data buffer structure. Basic numpy documentation describes `ndarray` structure, – hpaulj Jul 30 '19 at 01:10
  • 1
    `mpi4py` talks of using the Python buffer-protocol (and `pickle` for other kinds of objects). `numpy` uses this. https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/ is a sample introduction. There probably are new, more complete descriptions. – hpaulj Jul 30 '19 at 03:15
  • 1
    `a.__array_interface__['data'][0]` is an integer representation of the start of the data buffer of array `a`. Views of `a` will have values near by. e.g. `a[1:2]`, not `a[1]`. – hpaulj Jul 30 '19 at 03:56

1 Answers1

5

Numpy array saves its data in a memory area seperated from the object itself. As following image shows:

enter image description here

To get the address of the data you need to create views of the array and check the ctypes.data attribute which is the address of the first data element:

import numpy as np
a = np.zeros((3, 2))
print(a.ctypes.data)
print(a[0:1, 0].ctypes.data)
print(a[0:1, 1].ctypes.data)
print(a[1:2, 0].ctypes.data)
print(a[1:2, 1].ctypes.data)
HYRY
  • 94,853
  • 25
  • 187
  • 187