0

I'm trying to build a sparse matrix like that:

         0   |   1   |   2   |
0        0   |[1,2,3]|[7,8,9]|
1     [4,5,6]|   0   |   0   |

using the csr_matrix from scipy.sparse in Python.

I do it as following. It works with an 1-D array though.

csr_matrix(([[1,2,3][7,8,9][4,5,6]], ([0,0,1], [1,2,0])), shape=(2,3))

But I have the error ValueError: row, column, and data arrays must be 1-D

Is there any other package doing it?

Sorry for my bad english.

MlleStrife
  • 43
  • 1
  • 10
  • One ideas is using `pandas` data frame as a matrix. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html – tim Dec 03 '18 at 20:21
  • `scipy.sparse` is best for simple numeric values, so the matrices can be used for calculations like matrix products and linear algebra. So your idea doesn't fit. If you could construct such an array, what would use it for? It's not even a good fit for `numpy` arrays, or `pandas` dataframes (though possible). – hpaulj Dec 03 '18 at 20:31
  • There is a block matrix format that might fit your needs. I haven't used it much, but here's a recent answer using it - https://stackoverflow.com/questions/53574046/how-to-get-the-blocks-back-from-a-scipy-sparse-block-matrix/53574936#53574936 – hpaulj Dec 03 '18 at 21:53
  • I need it to be able te classify through Logistic Regression with 3 features Because actually i'm doing the same for 1 feature. – MlleStrife Dec 04 '18 at 08:05

1 Answers1

0

Here's a bsr representation of your array.

Use an ordinary (2,3) csr matrix to find the indices and indptr for the blocks:

In [335]: M1 = sparse.csr_matrix([[0,1,1],[1,0,0]])
In [336]: M1.A
Out[336]: 
array([[0, 1, 1],
       [1, 0, 0]], dtype=int64)

Define the data blocks. I had to order them to match the M1 layout:

In [337]: data = np.arange(1,10).reshape(3,1,3)[[0,2,1],:]
In [338]: data
Out[338]: 
array([[[1, 2, 3]],

       [[7, 8, 9]],

       [[4, 5, 6]]])

Now make a bsr matrix:

In [339]: M = sparse.bsr_matrix((data, M1.indices, M1.indptr), shape=(2,9))
In [340]: M
Out[340]: 
<2x9 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements (blocksize = 1x3) in Block Sparse Row format>
In [341]: M.A
Out[341]: 
array([[0, 0, 0, 1, 2, 3, 7, 8, 9],
       [4, 5, 6, 0, 0, 0, 0, 0, 0]])

It represents a (2,9) matrix, but the values are stored a 3 (1,3) blocks. For display, and most calculations it is converted to more conventional csr matrix.

This information could also be stored as dictionary:

In [349]: adict = {}
In [350]: adict[(0,1)] = data[0]
     ...: adict[(0,2)] = data[1]
     ...: adict[(1,0)] = data[2]
     ...: 
     ...: 
In [351]: adict
Out[351]: 
{(0, 1): array([[1, 2, 3]]),
 (0, 2): array([[7, 8, 9]]),
 (1, 0): array([[4, 5, 6]])}

sparse.dok_matrix is also a dict subclass. But it does not accept dtype=object, which would be the only way to store arrays as elements.

hpaulj
  • 221,503
  • 14
  • 230
  • 353