I have two sparse-matrices (created out of sklearn
HashVectorizer
, from two sets of features - each set corresponds to a feature). I want to concatenate them to later use them for clustering. But, I am facing a problem with dimensions, as the two matrices do not have the same row dimensions.
Here is an example:
Xa = [-0.57735027 -0.57735027 0.57735027 -0.57735027 -0.57735027 0.57735027
0.5 0.5 -0.5 0.5 0.5 -0.5 0.5
0.5 -0.5 0.5 -0.5 0.5 0.5 -0.5
0.5 0.5 ]
Xb = [-0.57735027 -0.57735027 0.57735027 -0.57735027 0.57735027 0.57735027
-0.5 0.5 0.5 0.5 -0.5 -0.5 0.5
-0.5 -0.5 -0.5 0.5 0.5 ]
Both Xa
and Xb
are of type <class 'scipy.sparse.csr.csr_matrix'>
. Shapes are Xa.shape = (6, 1048576) Xb.shape = (5, 1048576)
. The error I get is (which I know now why it happens):
X = hstack((Xa, Xb))
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.py", line 464, in hstack
return bmat([blocks], format=format, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.py", line 581, in bmat
'row dimensions' % i)
ValueError: blocks[0,:] has incompatible row dimensions
Is there a way to stack the sparse-matrices despite their irregular dimensions? Maybe with some padding?
I have looked into these posts: