1

I am trying to Concatenate 2 sparse matrix by the help of hstack function. xtrain_cat is the output of DictVectorizer(encodes categorical values) and xtrain_num is a pandas cvs file.

    xtrain_num = sparse.csr_matrix(xtrain_num)
    print type(xtrain_num)
    print xtrain_cat.shape
    print xtrain_num.shape
    x_train_data = hstack(xtrain_cat,xtrain_num)

Error :

(1000, 2778)
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>
(1000, 2778)
(1000, 968)
Traceback (most recent call last):
  File "D:\Projects\Zohair\Bosch\Bosch.py", line 360, in <module>
    x_train_data = hstack(xtrain_cat,xtrain_num)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 464, in hstack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 547, in bmat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D

Can someone identify what is the probelm

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
Zohair Zahid
  • 119
  • 2
  • 15
  • 1
    Possible duplicate: http://stackoverflow.com/questions/19710602/concatenate-sparse-matrices-in-python-using-scipy-numpy?rq=1 – lskrinjar Sep 08 '16 at 11:37
  • You should specify which `hstack` you used, `np.hstack` or `sparse.hstack`. As the answer shows, this error is produced by the sparse version, run without the proper list of inputs. – hpaulj Sep 08 '16 at 17:13

1 Answers1

4

You should try:

x_train_data = hstack((xtrain_cat,xtrain_num))

It takes a sequence:

blocks sequence of sparse matrices with compatible shapes


When I define a to be a sparse matrix, I can verify your error when I omit it (and correct it when I add it):

In [19]: sparse.hstack(a, a)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent     call last)
<ipython-input-19-7c450ab4fda0> in <module>()
----> 1 sparse.hstack(a, a)

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
    454 
    455     """
--> 456     return bmat([blocks], format=format, dtype=dtype)
    457 
    458 

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in     bmat(blocks, format, dtype)
    537 
    538     if blocks.ndim != 2:
--> 539         raise ValueError('blocks must be 2-D')
    540 
    541     M,N = blocks.shape

ValueError: blocks must be 2-D

In [20]: sparse.hstack((a, a))
Out[20]: 
<3x8 sparse matrix of type '<type 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Right, `sparse.hstack` is trying to treat the first argument as a list of matrices, or `blocks`. The `np.hstack` complains about the number of positional arguments. – hpaulj Sep 08 '16 at 17:16