24

I am wondering why in numpy there are one dimensional array of dimension (length, 1) and also one dimensional array of dimension (length, ) w/o a second value.

I am running into this quite frequently, e.g. when using np.concatenate() which then requires a reshape step beforehand (or I could directly use hstack/vstack).

I can't think of a reason why this behavior is desirable. Can someone explain?

Edit:
It was suggested by one of the comments that my question is a possible duplicate. I am more interested in the underlying working logic of Numpy and not that there is a distinction between 1d and 2d arrays which I think is the point of the mentioned thread.

Travis_Dudeson
  • 101
  • 2
  • 15
Dahlai
  • 695
  • 1
  • 4
  • 17
  • 2
    `(x,)` refers to a vector, not a matrix. – kennytm Jul 15 '16 at 17:42
  • 2
    NumPy is built around n-dimensional arrays, not matrices. – user2357112 Jul 15 '16 at 17:44
  • @kennytm Thanks for the input, but what is the reason that a vector would not be represented as `(x,1)`? – Dahlai Jul 15 '16 at 17:45
  • @user2357112 reworded accordingly – Dahlai Jul 15 '16 at 17:46
  • 3
    A numpy array with shape (x, 1) is a *two-dimensional array*. The second dimension just happens to have length 1. A numpy array with shape (x,) is a *one-dimensional array*. It has no second dimension. For much of numpy, you can stop thinking about "vectors" and "matrices", and just think about n-dimensional arrays. – Warren Weckesser Jul 15 '16 at 17:55
  • Possible duplicate of [numpy: 1D array with various shape](http://stackoverflow.com/questions/15680593/numpy-1d-array-with-various-shape) – Benjamin Jul 15 '16 at 19:19

2 Answers2

13

The data of a ndarray is stored as a 1d buffer - just a block of memory. The multidimensional nature of the array is produced by the shape and strides attributes, and the code that uses them.

The numpy developers chose to allow for an arbitrary number of dimensions, so the shape and strides are represented as tuples of any length, including 0 and 1.

In contrast MATLAB was built around FORTRAN programs that were developed for matrix operations. In the early days everything in MATLAB was a 2d matrix. Around 2000 (v3.5) it was generalized to allow more than 2d, but never less. The numpy np.matrix still follows that old 2d MATLAB constraint.

If you come from a MATLAB world you are used to these 2 dimensions, and the distinction between a row vector and column vector. But in math and physics that isn't influenced by MATLAB, a vector is a 1d array. Python lists are inherently 1d, as are c arrays. To get 2d you have to have lists of lists or arrays of pointers to arrays, with x[1][2] style of indexing.

Look at the shape and strides of this array and its variants:

In [48]: x=np.arange(10)

In [49]: x.shape
Out[49]: (10,)

In [50]: x.strides
Out[50]: (4,)

In [51]: x1=x.reshape(10,1)

In [52]: x1.shape
Out[52]: (10, 1)

In [53]: x1.strides
Out[53]: (4, 4)

In [54]: x2=np.concatenate((x1,x1),axis=1)

In [55]: x2.shape
Out[55]: (10, 2)

In [56]: x2.strides
Out[56]: (8, 4)

MATLAB adds new dimensions at the end. It orders its values like a order='F' array, and can readily change a (n,1) matrix to a (n,1,1,1). numpy is default order='C', and readily expands an array dimension at the start. Understanding this is essential when taking advantage of broadcasting.

Thus x1 + x is a (10,1)+(10,) => (10,1)+(1,10) => (10,10)

Because of broadcasting a (n,) array is more like a (1,n) one than a (n,1) one. A 1d array is more like a row matrix than a column one.

In [64]: np.matrix(x)
Out[64]: matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [65]: _.shape
Out[65]: (1, 10)

The point with concatenate is that it requires matching dimensions. It does not use broadcasting to adjust dimensions. There are a bunch of stack functions that ease this constraint, but they do so by adjusting the dimensions before using concatenate. Look at their code (readable Python).

So a proficient numpy user needs to be comfortable with that generalized shape tuple, including the empty () (0d array), (n,) 1d, and up. For more advanced stuff understanding strides helps as well (look for example at the strides and shape of a transpose).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks this was very helpful for understanding the origin of Numpy's behavior. I am wondering now what the benefit of these 1-dimensional arrays is. Why are they not hidden under the hood and to the user everything is minimum a `(x,1)` array? Are there specific cases where `(x,)` arrays have advantages? – Dahlai Jul 17 '16 at 11:32
  • A `(x,1)` displays as a column. A `(1,x)` as a row, but with an extra set of []. I generate 1d arrays all the time, e.g. `np.arange(10)`. I may reshape it to `(5,2)`. Only when I need to broadcast it do I add the trailing `, `[:,None]`. – hpaulj Mar 19 '17 at 04:59
  • can we say that `(x,)` is a Vector where as `(x,1)` is a Matrix? – weima Jul 24 '17 at 09:20
3

Much of it is a matter of syntax. This tuple (x) isn't a tuple at all (just a redundancy). (x,), however, is.

The difference between (x,) and (x,1) goes even further. You can take a look into the examples of previous questions like this. Quoting the example from it, this is an 1D numpy array:

>>> np.array([1, 2, 3]).shape
(3,)

But this one is 2D:

>>> np.array([[1, 2, 3]]).shape
(1, 3)

Reshape does not make a copy unless it needs to so it should be safe to use.

Community
  • 1
  • 1
armatita
  • 12,825
  • 8
  • 48
  • 49