6

I've been working with some text data and I've got few sparse matrices and dense (numpy arrays). I just want to know how to combine them correctly.

These are the types and shape of the arrays:

list1 
<109248x9 sparse matrix of type '<class 'numpy.int64'>'
    with 152643 stored elements in Compressed Sparse Row format>

list2
<109248x3141 sparse matrix of type '<class 'numpy.int64'>'
    with 350145 stored elements in Compressed Sparse Row format>

list3.shape   ,  type(list3)
(109248, 300) ,  numpy.ndarray

list4.shape   ,  type
(109248, 51)  ,  numpy.ndarray

I just want to combine all of them together as one dense matrix. I tried some vstack and hstack but couldn't figure it out. Any help is much appreciated.

Output required: (109248, 3501)
Jeeth
  • 2,226
  • 5
  • 24
  • 60
  • Turn the sparse ones to dense, eg `list1.A`, Then `hstack the list of all 4 – hpaulj Feb 22 '19 at 08:42
  • @hpaulj I tried to user `list2.toarray()` but I'm getting an error. `memory error`. Is it possible to combine it directly without actually converting those to dense? – Jeeth Feb 22 '19 at 08:47
  • Make dense arrays sparse, and use `sparse.hstack` – hpaulj Feb 22 '19 at 08:50
  • See https://stackoverflow.com/questions/16505670/generating-a-dense-matrix-from-a-sparse-matrix-in-numpy-python/16505766#16505766 for how to turn a sparse matrix into a dense one – Rachel Gallen Feb 22 '19 at 10:35

1 Answers1

6

sparse.hstack can join sparse and dense arrays. It first converts everything to coo format matrices, creates a new composite data, row and col arrays, and returns a coo matrix (optionally converting it to another specified format):

In [379]: M=sparse.random(10,10,.2,'csr')                                       
In [380]: M                                                                     
Out[380]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 20 stored elements in Compressed Sparse Row format>
In [381]: A=np.ones((10,2),float)                                               
In [382]: sparse.hstack([M,A])                                                  
Out[382]: 
<10x12 sparse matrix of type '<class 'numpy.float64'>'
    with 40 stored elements in COOrdinate format>
hpaulj
  • 221,503
  • 14
  • 230
  • 353