12
 I have a numpy array (A) of shape = (100000, 28, 28)
 I reshape it using A.reshape(-1, 28x28)

This is very common use in Machine learning pipelines. How does this work ? I have never understood the meaning of '-1' in reshape.

An exact question is this But no solid explanation. Any answers pls ?

Community
  • 1
  • 1
Anuj Gupta
  • 6,328
  • 7
  • 36
  • 55
  • 1
    See the documentation for that: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy-reshape - `The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.` – cel Jan 21 '17 at 07:09
  • In your example, the resulting array will have shape (100000, 784). It's shorthand for that 1st dimension, because only one value is right. – hpaulj Jan 21 '17 at 08:20

2 Answers2

22

in numpy, creating a matrix of 100X100 items is like this:

import numpy as np
x = np.ndarray((100, 100))
x.shape  # outputs: (100, 100)

numpy internally stores all these 10000 items in an array of 10000 items regardless of the shape of this object, this allows us to change the shape of this array into any dimensions as long as the number of items on the array does not change

for example, reshaping our object to 10X1000 is ok as we keep the 10000 items:

x = x.reshape(10, 1000)

reshaping to 10X2000 wont work as we does not have enough items on the list

x.reshape(10, 2000)
ValueError: total size of new array must be unchanged

so back to the -1 question, what it does is the notation for unknown dimension, meaning: let numpy fill the missing dimension with the correct value so my array remain with the same number of items.

so this:

x = x.reshape(10, 1000)

is equivalent to this:

x = x.reshape(10, -1) 

internally what numpy does is just calculating 10000 / 10 to get the missing dimension.

-1 can even be on the start of the array or in the middle.

the above two examples are equivalent to this:

x = x.reshape(-1, 1000)

if we will try to mark two dimensions as unknown, numpy will raise an exception as it cannot know what we are meaning as there are more than one way to reshape the array.

x = x.reshape(-1, -1)
ValueError: can only specify one unknown dimension
ShmulikA
  • 3,468
  • 3
  • 25
  • 40
  • @MaxPowers : Can I say: you allowed not to specify 1 of the dimansions of the reshaped array. you use -1 for that "non specified" dimension. At run time this is inferred from old size and other dimensions of the reshaped array ? – Anuj Gupta Feb 02 '17 at 12:03
  • numpy just calculate it for you instead you write it explicitly. no runtime is involved. yes you right it is inferred from the old size - `reshape()` cannot change the number of array items – ShmulikA Feb 02 '17 at 12:05
18

It means, that the size of the dimension, for which you passed -1, is being inferred. Thus,

A.reshape(-1, 28*28)

means, "reshape A so that its second dimension has a size of 28*28 and calculate the correct size of the first dimension".

See documentation of reshape.

MaxPowers
  • 5,235
  • 2
  • 44
  • 69