51

What is the difference between 2 arrays whose shapes are-

(442,1) and (442,) ?

Printing both of these produces an identical output, but when I check for equality ==, I get a 2D vector like this-

array([[ True, False, False, ..., False, False, False],
       [False,  True, False, ..., False, False, False],
       [False, False,  True, ..., False, False, False],
       ..., 
       [False, False, False, ...,  True, False, False],
       [False, False, False, ..., False,  True, False],
       [False, False, False, ..., False, False,  True]], dtype=bool)

Can someone explain the difference?

goelakash
  • 2,502
  • 4
  • 40
  • 56
  • 5
    unutbu makes the key insight as a short comment below. To expand: numpy array shapes are returned as python [tuples](https://www.tutorialspoint.com/python/python_tuples.htm) which, unlike a python list, can't straightforwardly be written down with a single entry: `(442)` would just evaluate to the integer `422` , unlike `[422]`. The extra comma is just an aspect of python tuple syntax for single-element tuples to distinguish them from integers, not anything specific to do with numpy arrays. – Jess Riedel Jul 18 '18 at 12:37

1 Answers1

60

An array of shape (442, 1) is 2-dimensional. It has 442 rows and 1 column.

An array of shape (442, ) is 1-dimensional and consists of 442 elements.

Note that their reprs should look different too. There is a difference in the number and placement of parenthesis:

In [7]: np.array([1,2,3]).shape
Out[7]: (3,)

In [8]: np.array([[1],[2],[3]]).shape
Out[8]: (3, 1)

Note that you could use np.squeeze to remove axes of length 1:

In [13]: np.squeeze(np.array([[1],[2],[3]])).shape
Out[13]: (3,)

NumPy broadcasting rules allow new axes to be automatically added on the left when needed. So (442,) can broadcast to (1, 442). And axes of length 1 can broadcast to any length. So when you test for equality between an array of shape (442, 1) and an array of shape (442, ), the second array gets promoted to shape (1, 442) and then the two arrays expand their axes of length 1 so that they both become broadcasted arrays of shape (442, 442). This is why when you tested for equality the result was a boolean array of shape (442, 442).

In [15]: np.array([1,2,3]) == np.array([[1],[2],[3]])
Out[15]: 
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

In [16]: np.array([1,2,3]) == np.squeeze(np.array([[1],[2],[3]]))
Out[16]: array([ True,  True,  True], dtype=bool)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    Thanks. I am new to data mining and couldn't understand the ([value], ) syntax as opposed to the normal ([value]) syntax for array shapes. That extra comma was making things convoluted. – goelakash Dec 19 '14 at 17:26
  • 22
    The comma in `(422, )` indicates the expression is a tuple. It's a tuple with one element inside. Without the comma, `(422)` gets evaluated as the integer `422`. The shape of an array is always a tuple. – unutbu Dec 19 '14 at 17:36
  • 4
    Are arrays of size (1,442) and (442,) the same then? – Bikash Gyawali Mar 08 '17 at 10:51
  • 1
    @bikashg yes there are identical – Florian Courtial Aug 24 '17 at 16:07