How is the memory allocated for numpy arrays in python?

Question

I tried to understand the difference caused by numpy "2D" arrays, that is, numpy.zeros((3, )), numpy.zeros((3, 1)), numpy.zeros((1, 3)).

I used id to look at the memory allocation for each element. But I found some weird outputs in iPython console.

a = np.zeros((1, 3))
In [174]: id(a[0, 0])
Out[174]: 4491074656

In [175]: id(a[0, 1])
Out[175]: 4491074680

In [176]: id(a[0, 2])
Out[176]: 4491074704

In [177]: id(a[0, 0])
Out[177]: 4491074728

In [178]: id(a[0, 1])
Out[178]: 4491074800

In [179]: id(a)
Out[179]: 4492226688

In [180]: id(a[0, 1])
Out[180]: 4491074752

The memories of the elements are

not consecutive
changing without reassignment

Moreover, the elements in the array of shape (1, 3) seem to be of successive memory at first, but it's not even the case for other shapes, like

In [186]: a = np.zeros((3, ))

In [187]: id(a)
Out[187]: 4490927280

In [188]: id(a[0])
Out[188]: 4491075040

In [189]: id(a[1])
Out[189]: 4491074968

In [191]: a = np.random.rand(4, 1)

In [192]: id(a)
Out[192]: 4491777648

In [193]: id(a[0])
Out[193]: 4491413504

In [194]: id(a[1])
Out[194]: 4479900048

In [195]: id(a[2])
Out[195]: 4491648416

I am actually not quite sure whether id is suitable to check memory in Python. From my knowledge I guess there is no easy way to get the physical address of variables in Python.

Just like C or Java, I expect the elements in such "2D" arrays should be consecutive in memory, which seems not to be true. Besides, the results of id are keeping changing, which really confuses me.

I am interested in this because I am using mpi4py a little bit, and I wanna figure out how the variables are sent/received between CPUs.

according to [this question](https://stackoverflow.com/questions/39377866/does-numpy-internally-store-size-of-an-array) and [this question](https://stackoverflow.com/questions/40911491/some-confusions-on-how-numpy-array-stored-in-python), numpy does not store array values (?) so when you use `id()`, it creates the array on the fly and that's why "The memories of the elements are changing without reassignment" (?) — AcaNg, Jul 30 '19 at 00:54
`id` tells you nothing about this. Those 3 arrays use the same underlying data buffer structure. Basic numpy documentation describes `ndarray` structure, — hpaulj, Jul 30 '19 at 01:10
`mpi4py` talks of using the Python buffer-protocol (and `pickle` for other kinds of objects). `numpy` uses this. https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/ is a sample introduction. There probably are new, more complete descriptions. — hpaulj, Jul 30 '19 at 03:15
`a.__array_interface__['data'][0]` is an integer representation of the start of the data buffer of array `a`. Views of `a` will have values near by. e.g. `a[1:2]`, not `a[1]`. — hpaulj, Jul 30 '19 at 03:56

score 5 · Accepted Answer · answered Jul 30 '19 at 01:10

5

Numpy array saves its data in a memory area seperated from the object itself. As following image shows:

enter image description here

To get the address of the data you need to create views of the array and check the ctypes.data attribute which is the address of the first data element:

import numpy as np
a = np.zeros((3, 2))
print(a.ctypes.data)
print(a[0:1, 0].ctypes.data)
print(a[0:1, 1].ctypes.data)
print(a[1:2, 0].ctypes.data)
print(a[1:2, 1].ctypes.data)

answered Jul 30 '19 at 01:10

HYRY

94,853
25
187
187

Where did you get the nice visualization from? – user3731622 Sep 23 '22 at 20:41

How is the memory allocated for numpy arrays in python?

1 Answers1