237

Is there a quick way to "sub-flatten" or flatten only some of the first dimensions in a numpy array?

For example, given a numpy array of dimensions (50,100,25), the resultant dimensions would be (5000,25)

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
IssamLaradji
  • 6,637
  • 8
  • 43
  • 68
  • 1
    This might help http://stackoverflow.com/questions/13990465/3d-numpy-array-to-2d – Ankur Ankan Sep 12 '13 at 07:16
  • 1
    You need a refresher course on numpy ndarray array slicing. Also known as multi dimensional array indexing, see: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html Array slice your ndarray using square brackets, and use the comma delimiter to separate how much of each dimension you want. It will look something like (not exactly) this: `your_array[50:100, 7, :]` which flattens the 3d object to 2d, using only slice number 7 for the 2nd dimension. – Eric Leschinski Aug 31 '17 at 23:38
  • 2
    ^ Slices just take a subset, the poster wants to retain all the datapoints. I assume you mean `array[0:50,7,:]` which gives size `(50,25)`, dropping 99% of the data. – Sherman May 19 '22 at 06:39

5 Answers5

212

Take a look at numpy.reshape .

>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape   
# (5000, 25)

# One shape dimension can be -1. 
# In this case, the value is inferred from 
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)
Alexander
  • 12,424
  • 5
  • 59
  • 76
  • 48
    Such solutions seem a tiny bit inelegant to me, as they require some redundant information. I wish there were a way to do this which only required specifying the subset of dimensions, something like `arr.flatten(dimensions=(0, 1))`. – Denziloe Aug 14 '20 at 01:01
  • 4
    @Denziloe one cannot simply 'flatten' an arbitrary dimension of an ndarray without specifying which dimension the extra data will be folded into.Take for example a 2x2x3 ndarray, flattening the last dimension can produce a 2x6 or 6x2, so the information isn't redundant. You can specify the dimension with a -1: From [numpy.reshape](http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. So a 2x2xN reshaped to a 2Nx2 looks like this:`arr.reshape((-1,2))`. – אלימלך שרייבר Feb 14 '21 at 10:38
  • 2
    @Denziloe A way to achieve this may be something like `arr.reshape(arr.shape[0] * arr.shape[1], arr.shape[2])` – Adrien Pavao Jul 02 '21 at 12:03
  • 4
    @אלימלךשרייבר Funny, torch seems to manage this somehow: https://pytorch.org/docs/stable/generated/torch.flatten.html ;) – Sebastian Hoffmann May 07 '22 at 14:28
  • @SebastianHoffmann, numpy's [flatten](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html) too would manage. Flatten, as implied by the function's name, flattens the tensor/ndarray into a 1-D array. So there is *no ambiguity that needs resolving. Whereas the issue discussed here is the flattening of a single dimension, e.g. a 6-D into a 5-D tensor/ndarray. [Torch Reshape](https://pytorch.org/docs/stable/generated/torch.reshape.html)needs the same specification in this regard. – אלימלך שרייבר May 08 '22 at 20:42
  • thanks @AdrienPavao, I was looking for that! but what to do when dimensions to flatten are for instance dim=(0,2) ? Not clear to me, I guess swapping axis should do the job... – Rémy Hosseinkhan Boucher Aug 05 '22 at 14:58
  • @אלימלךשרייבר You can't flatten a single axis, but you could easily flatten two axes. As an example you could flatten a 2x3x4 array like `np.flatten(arr, axes=(0,1))` to get an 6x4 array. – AccidentalTaylorExpansion Aug 15 '23 at 13:31
  • @AccidentalTaylorExpansion There is no np.flatten, and np.ndarray.flatten takes only one argument: the array to flatten. You might be thinking of reshape. – אלימלך שרייבר Aug 24 '23 at 22:42
118

A slight generalization to Alexander's answer - np.reshape can take -1 as an argument, meaning "total array size divided by product of all other listed dimensions":

e.g. to flatten all but the last dimension:

>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)
Peter
  • 12,274
  • 9
  • 71
  • 86
72

A slight generalization to Peter's answer -- you can specify a range over the original array's shape if you want to go beyond three dimensional arrays.

e.g. to flatten all but the last two dimensions:

arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)

EDIT: A slight generalization to my earlier answer -- you can, of course, also specify a range at the beginning of the of the reshape too:

arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)
KeithWM
  • 1,295
  • 10
  • 19
  • 36
    It's been more than two years already... We need another slight generalization! ;) – Lith Mar 27 '20 at 13:44
12

numpy.vstack is perfect for this situation

import numpy as np
arr = np.ones((50,100,25))
np.vstack(arr).shape
> (5000, 25)

I prefer to use stack, vstack or hstack over reshape because reshape just scans through the data and seems to brute-force it into the desired shape. This can be problematic if you are e.g. going to take column averages.

Here's an illustration of what I mean. Suppose we have the following array

>>> arr.shape
(2, 3, 4)
>>> arr 
array([[[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[7, 7, 7, 7],
        [7, 7, 7, 7],
        [7, 7, 7, 7]]])

We apply both methods to get an array of shape (3,8)

>>> arr.reshape((3,8)).shape
(3, 8)
>>> np.hstack(arr).shape 
(3, 8)

However if we look at how they have been reshaped in each case, the hstack would allow us to take column sums that we could also have calculated from the original array. With reshape this isn't possible.

>>> arr.reshape((3,8))
array([[1, 2, 3, 4, 1, 2, 3, 4],
       [1, 2, 3, 4, 7, 7, 7, 7],
       [7, 7, 7, 7, 7, 7, 7, 7]])
>>> np.hstack(arr)
array([[1, 2, 3, 4, 7, 7, 7, 7],
       [1, 2, 3, 4, 7, 7, 7, 7],
       [1, 2, 3, 4, 7, 7, 7, 7]])
Sherman
  • 437
  • 6
  • 9
6

An alternative approach is to use numpy.resize() as in:

In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)

# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True
kmario23
  • 57,311
  • 13
  • 161
  • 150