1

I would like to convert a numpy array with dtype=object to a sparse array e.g. csr_matrix. However, this fails.

x = np.array(['a', 'b', 'c'], dtype=object)

csr_matrix(x) # This fails
csc_matrix(x) # This fails

Both of the calls to sparse matrices produce the following error:

TypeError: no supported conversion for types: (dtype('O'),)

In fact, even calling

csr_matrix(['a', 'b', 'c'])

produces the same error. Do sparse matrices not support object dtypes?

sascha
  • 32,238
  • 6
  • 68
  • 110
Pavlin
  • 5,390
  • 6
  • 38
  • 51
  • Can a sparse matrix contain non numerical elements ? – Espoir Murhabazi Dec 16 '17 at 12:04
  • What is a `zero` element in object dtype? The `csr` math won't work with objects. It is compiled with a limited set of numeric types. What do you expect to do with such a matrix? Even strings don't work. – hpaulj Dec 16 '17 at 13:16
  • Well, I would expect `None` to be the `zero` element. However, it does make sense to only work with numeric types. – Pavlin Dec 16 '17 at 16:53

2 Answers2

3

I don't think this is supported and while the documents are a bit sparse on this end, this part of the sources should show that:

# List of the supported data typenums and the corresponding C++ types
#
T_TYPES = [
    ('NPY_BOOL', 'npy_bool_wrapper'),
    ('NPY_BYTE', 'npy_byte'),
    ('NPY_UBYTE', 'npy_ubyte'),
    ('NPY_SHORT', 'npy_short'),
    ('NPY_USHORT', 'npy_ushort'),
    ('NPY_INT', 'npy_int'),
    ('NPY_UINT', 'npy_uint'),
    ('NPY_LONG', 'npy_long'),
    ('NPY_ULONG', 'npy_ulong'),
    ('NPY_LONGLONG', 'npy_longlong'),
    ('NPY_ULONGLONG', 'npy_ulonglong'),
    ('NPY_FLOAT', 'npy_float'),
    ('NPY_DOUBLE', 'npy_double'),
    ('NPY_LONGDOUBLE', 'npy_longdouble'),
    ('NPY_CFLOAT', 'npy_cfloat_wrapper'),
    ('NPY_CDOUBLE', 'npy_cdouble_wrapper'),
    ('NPY_CLONGDOUBLE', 'npy_clongdouble_wrapper'),
]

Asking for object-based types sounds like a lot. Even some more basic types like float16 are missing.

sascha
  • 32,238
  • 6
  • 68
  • 110
  • Ah, thank you! I was quite puzzled at this behaviour, and I guessed it must be something like this. Thanks for confirming! – Pavlin Dec 16 '17 at 12:08
3

It is possible to create a coo format matrix from your x:

In [22]: x = np.array([['a', 'b', 'c']], dtype=object)
In [23]: M=sparse.coo_matrix(x)
In [24]: M
Out[24]: 
<1x3 sparse matrix of type '<class 'numpy.object_'>'
    with 3 stored elements in COOrdinate format>
In [25]: M.data
Out[25]: array(['a', 'b', 'c'], dtype=object)

coo has just flattened the input array and assigned it to its data attribute. (row and col have the indices).

In [31]: M=sparse.coo_matrix(x)
In [32]: print(M)
  (0, 0)    a
  (0, 1)    b
  (0, 2)    c

But displaying it as an array produces an error.

In [26]: M.toarray()
ValueError: unsupported data types in input

Trying to convert it to other formats produces your typeerror.

dok sort of works:

In [28]: M=sparse.dok_matrix(x)
/usr/local/lib/python3.5/dist-packages/scipy/sparse/sputils.py:114: UserWarning: object dtype is not supported by sparse matrices
  warnings.warn("object dtype is not supported by sparse matrices")
In [29]: M
Out[29]: 
<1x3 sparse matrix of type '<class 'numpy.object_'>'
    with 3 stored elements in Dictionary Of Keys format>

String dtype works a little better, x.astype('U1'), but still has problems with conversion to csr.

Sparse matrices were developed for large linear algebra problems. The ability to do matrix multiplication and linear equation solution were most important. Their application to non-numeric tasks is recent, and incomplete.

hpaulj
  • 221,503
  • 14
  • 230
  • 353