5

I have need for hstacking multple arrays with with the same number of rows (although the number of rows is variable between uses) but different number of columns. However some of the arrays only have one column, eg.

array = np.array([1,2,3,4,5])

which gives

#array.shape = (5,)

but I'd like to have the shape recognized as a 2d array, eg.

#array.shape = (5,1)

So that hstack can actually combine them. My current solution is:

array = np.atleast_2d([1,2,3,4,5]).T
#array.shape = (5,1)

So I was wondering, is there a better way to do this? Would

array = np.array([1,2,3,4,5]).reshape(len([1,2,3,4,5]), 1)

be better? Note that my use of [1,2,3,4,5] is just a toy list to make the example concrete. In practice it will be a much larger list passed into a function as an argument. Thanks!

Taaam
  • 1,098
  • 3
  • 11
  • 23

4 Answers4

5

Check the code of hstack and vstack. One, or both of those, pass the arguments through atleast_nd. That is a perfectly acceptable way of reshaping an array.

Some other ways:

arr = np.array([1,2,3,4,5]).reshape(-1,1)  # saves the use of len()
arr = np.array([1,2,3,4,5])[:,None]  # adds a new dim at end
np.array([1,2,3],ndmin=2).T  # used by column_stack

hstack and vstack transform their inputs with:

arrs = [atleast_1d(_m) for _m in tup]
[atleast_2d(_m) for _m in tup]

test data:

a1=np.arange(2)
a2=np.arange(10).reshape(2,5)
a3=np.arange(8).reshape(2,4)

np.hstack([a1.reshape(-1,1),a2,a3])
np.hstack([a1[:,None],a2,a3])
np.column_stack([a1,a2,a3])

result:

array([[0, 0, 1, 2, 3, 4, 0, 1, 2, 3],
       [1, 5, 6, 7, 8, 9, 4, 5, 6, 7]])

If you don't know ahead of time which arrays are 1d, then column_stack is easiest to use. The others require a little function that tests for dimensionality before applying the reshaping.

Numpy: use reshape or newaxis to add dimensions

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Interesting, I didn't know about these two ways. I will try those with some timing tests to see which performs best, but both seem less contrived than my method. Thanks! – Taaam Feb 16 '15 at 18:26
  • I added a link to a recent related SO question. – hpaulj Feb 16 '15 at 18:58
  • The latest version has added a more general purpose `stack`. – hpaulj May 23 '16 at 01:37
1

If I understand your intent correctly, you wish to convert an array of shape (N,) to an array of shape (N,1) so that you can apply np.hstack:

In [147]: np.hstack([np.atleast_2d([1,2,3,4,5]).T, np.atleast_2d([1,2,3,4,5]).T])
Out[147]: 
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4],
       [5, 5]])

In that case, you could use avoid reshaping the arrays and use np.column_stack instead:

In [151]: np.column_stack([[1,2,3,4,5], [1,2,3,4,5]])
Out[151]: 
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4],
       [5, 5]])
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks, I'm actually using this with scikit-learn so and they use hstack internally so I do need them to be row-oriented. – Taaam Feb 16 '15 at 18:25
1

I followed Ludo's work and just changed the size of v from 5 to 10000. I ran the code on my PC and the result shows that atleast_2d seems to be a more efficient method in the larger scale case.

import numpy as np
import timeit

v = np.arange(10000)

print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1)))  # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None]))  # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T))  # used by column_stack

The result is:

atleast2d: 1.3809496470021259
reshape: 27.099974197000847
v[:,None]: 28.58291715100131
np.array(v,ndmin=2).T: 30.141663907001202

My suggestion is that use [:None] when dealing with a short vector and np.atleast_2d when your vector goes longer.

Ding Zhao
  • 11
  • 1
0

Just to add info on hpaulj's answer. I was curious about how fast were the four methods described. The winner is the method adding a column at the end of the 1d array.

Here is what I ran:

import numpy as np
import timeit

v = [1,2,3,4,5]

print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1)))  # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None]))  # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T))  # used by column_stack

And the results:

atleast2d: 4.455070924214851
reshape: 2.0535152913971615
v[:,None]: 1.8387219828073285
np.array(v,ndmin=2).T: 3.1735243063353664
Ludo
  • 36
  • 5