4

I have the following matrices:

A.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

type(A)

scipy.sparse.csr.csr_matrix

A.shape
(878049, 942)

And matrix B:

B

array([2248, 2248, 2248, ...,    0,    0,    0])

type(B)

numpy.ndarray

B.shape

(878049,)

I would like to column stack A and B in C, I tried the folowing:

C =  sparse.column_stack([A,B])

Then:

/usr/local/lib/python3.5/site-packages/numpy/lib/shape_base.py in column_stack(tup)
    315             arr = array(arr, copy=False, subok=True, ndmin=2).T
    316         arrays.append(arr)
--> 317     return _nx.concatenate(arrays, 1)
    318 
    319 def dstack(tup):

ValueError: all the input array dimensions except for the concatenation axis must match exactly

My problem is how can I preserve the dimentions. Thus, any idea of how to column stack them?.

Update

I tried the following:

#Sorry for the name
C =  np.vstack(( A.A.T, B)).T

and I got:

array([[   0,    0,    0, ...,    0,    6],
       [   0,    0,    0, ...,    0,    6],
       [   0,    0,    0, ...,    0,    6],
       ..., 
       [   0,    0,    0, ...,    0,    1],
       [   0,    0,    0, ...,    0,    1],
       [   0,    0,    0, ...,    0,    1]], dtype=int64)

Is this the correct way to column stack them?.

john doe
  • 2,233
  • 7
  • 37
  • 58
  • 1
    where did you find `sparse.column_stack`? There's `np.column_stack`, but not a sparse version. – hpaulj Jul 22 '16 at 05:20
  • All the answers below go unsparse. Check out this answer http://stackoverflow.com/a/33259578/2988730. It looks like exactly what you were looking for. I'm voting to close as dupe. – Mad Physicist Jul 22 '16 at 13:41
  • Possible duplicate of [Is there an efficient way of concatenating scipy.sparse matrices?](http://stackoverflow.com/questions/6844998/is-there-an-efficient-way-of-concatenating-scipy-sparse-matrices) – Mad Physicist Jul 22 '16 at 13:41

2 Answers2

4

2 issues

  • there isn't a sparse.column_stack
  • you are mixing a sparse matrix and dense array

2 smaller examples:

In [129]: A=sparse.csr_matrix([[1,0,0],[0,1,0]])
In [130]: B=np.array([1,2])

Using np.column_stack gives your error:

In [131]: np.column_stack((A,B))
... 
ValueError: all the input array dimensions except for the concatenation axis must match exactly

But if I first turn A into an array, column_stack does fine:

In [132]: np.column_stack((A.A, B))
Out[132]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 2]])

the equivalent with concatenate:

In [133]: np.concatenate((A.A, B[:,None]), axis=1)
Out[133]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 2]])

there is a sparse.hstack. For that I need to turn B into a sparse matrix as well. Transpose works because it is now a matrix (as opposed to a 1d array):

In [134]: sparse.hstack((A,sparse.csr_matrix(B).T))
Out[134]: 
<2x4 sparse matrix of type '<class 'numpy.int32'>'
    with 4 stored elements in COOrdinate format>
In [135]: _.A
Out[135]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 2]], dtype=int32)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
2

Did you try the following?

C=np.vstack((A.T,B)).T

With sample values:

A = array([[1, 2, 3], [4, 5, 6]])
>>>> A.shape
(2, 3)
B = array([7, 8])
>>> B.shape
(2,)
C=np.vstack((A.T,B)).T
>>> C.shape
(2, 4)

If A is a sparse matrix, and you want to maintain the output as sparse, you could do:

C=np.vstack((A.A.T,B)).T
D=csr_matrix((C))
giosans
  • 1,136
  • 1
  • 12
  • 30