Is there a way to form sparse n-dimensional array in Python3?

Question

I am pretty new to Python and have been wondering if there an easy way so that I could form a sparse n-dimensional array M in Python3 with following 2 conditions mainly required (along the lines of SciPy COO_Matrix):

M[dim1,dim2,dim3,...] = 1.0
Like SciPy COO_Matrix M: M.row, M.col, I may be able to get all the row and column indices for which non-zero entries exist in the matrix. In N-dimension, this generalizes to calling: M.1 for 1st dimension, M.2 for 2nd dimension and so on...

For 2-dimension (the 2 conditions):

 1.
     for u, i in data:
        mat[u, i] = 1.0

 2. def get_triplets(mat):
        return mat.row, mat.col

Can these 2 conditions be generalized in N-dimensions? I searched and came across this:

sparse 3d matrix/array in Python?

But here 2nd condition is not satisfied: In other words, I can't get the all the nth dimensional indices in a vectorized format.

Also this: http://www.janeriksolem.net/sparray-sparse-n-dimensional-arrays-in.html works for python and not python3.

Is there a way to implement n-dimensional arrays with above mentioned 2 conditions satisfied? Or I am over-complicating things? I appreciate any help with this :)

You could certainly create a data structure modeled on either `coo` (column per dimension) or `dok`. And you could fill it in a way that meets your conditions. But whether you can do anything useful with it (multiplication, display, etc) without doing a lot of coding is a tougher question. For a start, demonstrate your conditions using the `scipy.sparse` 2d code. — hpaulj, Mar 16 '17 at 22:53
Scipy has different ways to initialize sparse matrices- and they can be converted into each other. But you seem to look for a matrix that is sparse but with 1 being the value of any sparse element. That won't work at all as the optimizations of sparse matrices are based on the fact that sparse cells are zero. — RuDevel, Mar 16 '17 at 22:54
@hpaulj, I edited based on your comment. Hope its a bit clearer now. I am reading your answer now. — learner, Mar 16 '17 at 23:39
@RuDevel, I did not mean that. M is sparse N-dimensional sparse array with non-zero values as 1.0 and all other values as 0.0 — learner, Mar 16 '17 at 23:45

score 1 · Accepted Answer · edited May 23 '17 at 12:09

In the spirit of coo format I could generate a 3d sparse array representation:

In [106]: dims = 2,4,6
In [107]: data = np.zeros((10,4),int)
In [108]: data[:,-1] = 1
In [112]: for i in range(3):
     ...:     data[:,i] = np.random.randint(0,dims[i],10)

In [113]: data
Out[113]: 
array([[0, 2, 3, 1],
       [0, 3, 4, 1],
       [0, 0, 1, 1],
       [0, 3, 0, 1],
       [1, 1, 3, 1],
       [1, 0, 2, 1],
       [1, 1, 2, 1],
       [0, 2, 5, 1],
       [0, 1, 5, 1],
       [0, 1, 2, 1]])

Does that meet your requirements? It's possible there are some duplicates. sparse.coo sums duplicates before it converts the array to dense for display, or to csr for calculations.

The corresponding dense array is:

In [130]: A=np.zeros(dims, int)
In [131]: for row in data:
     ...:     A[tuple(row[:3])] += row[-1]

In [132]: A
Out[132]: 
array([[[0, 1, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 1],
        [0, 0, 0, 1, 0, 1],
        [1, 0, 0, 0, 1, 0]],

       [[0, 0, 1, 0, 0, 0],
        [0, 0, 1, 1, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]]])

(no duplicates in this case).

A 2d sparse matrix using a subset of this data is

In [118]: sparse.coo_matrix((data[:,3],(data[:,1],data[:,2])),(4,6)).A
Out[118]: 
array([[0, 1, 1, 0, 0, 0],
       [0, 0, 2, 1, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]])

That's in effect the sum over the first dimension.

I'm assuming that

M[dim1,dim2,dim3,...] = 1.0

means the non-zero elements of the array must have a data value of 1.

Pandas has a sparse data series and data frame format. That allows for a non-zero 'fill' value. I don't know if the multi-index version can be thought of as higher than 2d or not. There have been a few SO questions about converting the Pandas sparse arrays to/from the scipy sparse.

Convert Pandas SparseDataframe to Scipy sparse csc_matrix

http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse

Is there a way to form sparse n-dimensional array in Python3?

1 Answers1