5

Can someone explain the Numpy design decision to keep single elements of arrays as distinct from Python scalars?

The following code works without errors

import numpy as np
a = np.array([1, 2, 3])
b = a[0]
print(b.size)

This illustrates that b is not a simple Python scalar, and in fact type(b) gives numpy.int32 instead of int.

Of course, if one defines b = 1, the command b.size throws an error because

AttributeError: 'int' object has no attribute 'size'

I find this difference of behaviour confusing and I am wondering what is its motivation.

divenex
  • 15,176
  • 9
  • 55
  • 55
  • This question is relevant, possibly a duplicate: http://stackoverflow.com/questions/773030/why-are-0d-arrays-in-numpy-not-considered-scalar – Alex Riley Dec 14 '15 at 16:02
  • This page of documentation seems to claim that default python functionality would cause incorrect behavior if used in many scientific computing settings: http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.scalars.html – BlackVegetable Dec 14 '15 at 16:04
  • @ajcr The same answer likely applies to this question, but the question itself does not strike me as a duplicate. – BlackVegetable Dec 14 '15 at 16:05

1 Answers1

6

There is a difference between elements of an array and the object you get when indexing one.

The array has a data buffer. It is a block of bytes the numpy manages with its own compiled code. Individual elements may be represented by 1 byte, 4, 8, 16, etc.

In [478]: A=np.array([1,2,3])

In [479]: A.__array_interface__
Out[479]: 
{'data': (167487856, False),
 'descr': [('', '<i4')],
 'shape': (3,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}

view the data as a list of bytes (displayed as characters):

In [480]: A.view('S1')
Out[480]: 
array(['\x01', '', '', '', '\x02', '', '', '', '\x03', '', '', ''], 
      dtype='|S1')

When you select an element of A you get back a one element array (or something like it):

In [491]: b=A[0]

In [492]: b.shape
Out[492]: ()

In [493]: b.__array_interface__
Out[493]: 
{'__ref': array(1),
 'data': (167480104, False),
 'descr': [('', '<i4')],
 'shape': (),
 'strides': None,
 'typestr': '<i4',
 'version': 3}

The type is different, but b has most of the same attributes as A, shape, strides, mean, etc.

You have to use .item to access the underlying 'scalar':

In [496]: b.item()
Out[496]: 1

In [497]: type(b.item())
Out[497]: int

So you can think of b as a scalar with a numpy wrapper. The __array_interface__ for b looks very much like that of np.array(1).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you very much for the clear answer. Of course this explains what the difference really is, but not why this design choice was made. Is the distinction merely due to a technical limitation, because Numpy is just a package of Python (unlike e.g. MATLAB, or Mathematica, or IDL)? Or was the Numpy choice better than the MATLAB one, where the distinction does not exist? – divenex Dec 15 '15 at 10:14
  • `size(x(1))` returns `1,1` in Octave/MATLAB. In the original MATLAB everything was a 2d matrix; there were no scalars. That's weakened a bit in new versions. – hpaulj Dec 15 '15 at 17:20
  • The key difference is that in MATLAB `size(1)` is identical to `size(x(1))`: they both return `1,1`. A single array element is indistinguishable from a scalar. This is not the case in Numpy. – divenex Apr 27 '21 at 15:23