sparse 3d matrix/array in Python?

Question

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.

I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in Python?

p.s. I have lots of sparse data in 3d and need a tensor to store / perform multiplication. Any suggestions to implement such a tensor if there's no existing data structure?

this post might help http://stackoverflow.com/questions/4490961/numpy-multiplying-a-matrix-with-a-3d-tensor-suggestion — jayunit100, Oct 09 '11 at 04:08
What do you mean by a "matrix in 2D"? If you mean a matrix representing a 2D linear transformation, then you're talking about a 2x2 matrix of Real values (approximated by floating point values) with determinant 1 for a rigid rotation. If you want to represent translation as well then you embed the 2x2 matrix inside a 3x3 matrix, and if you want to allow shearing or expansion/contraction then you can relax the determinant requirement -but even so that's a total of 9 floating point values. Why do you want/need a sparse representation? — Peter, Oct 12 '11 at 15:31
@Peter "a matrix in 2D" means a matrix in 2 dimension. A unit in a 2d matrix can be represented as (x,y, r), where x & y are the coordinate and r is the value stored at (x, y). I need a sparse representation because when x & y are very very large, say x<10^5, y < 10^4, AND only very few data are stored in the matrix, say 10^4. numpy provides sparse matrix for the 2d matrix. But very often, we need 3d or even n-d. I guess n-d case is too general. So any solutions to 3d are good enough for me. — zhongqi, Oct 12 '11 at 15:58
Thanks - I was confused by the P.S. in your question (it sounded to me like you wanted to multiply a bunch of Euclidean tuples by a matrix, linear algebra style). But if you're talking about m x n x o matrices then it sounds like your "sparse" implementation is going to need to provide some sort of iterator interface in order for you to implement (element by element) multiplication. — Peter, Oct 12 '11 at 17:39

tehwalrus · Answer 1 · 2011-10-12T15:23:56.920

Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.

A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples:

class NDSparseMatrix:
  def __init__(self):
    self.elements = {}

  def addValue(self, tuple, value):
    self.elements[tuple] = value

  def readValue(self, tuple):
    try:
      value = self.elements[tuple]
    except KeyError:
      # could also be 0.0 if using floats...
      value = 0
    return value

and you would use it like so:

sparse = NDSparseMatrix()
sparse.addValue((1,2,3), 15.7)
should_be_zero = sparse.readValue((1,5,13))

You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.

EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array (numpy.ndarray) might look like this:

#cython: boundscheck=False
#cython: wraparound=False

cimport numpy as np

def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
  cdef unsigned int i, j, k

  out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)

  for i in xrange(1,u.shape[0]-1):
    for j in xrange(1, u.shape[1]-1):
      for k in xrange(1, u.shape[2]-1):
        # note, here you must define your own rank-3 multiplication rule, which
        # is, in general, nontrivial, especially if LxMxN tensor...

        # loop over a dummy variable (or two) and perform some summation:
        out[i,j,k] = u[i,j,k] * sparse((i,j,k))

  return out

Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!

EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:

def sparse_mult(sparse, other_sparse):

  out = NDSparseMatrix()

  for key, value in sparse.elements.items():
    i, j, k = key
    # note, here you must define your own rank-3 multiplication rule, which
    # is, in general, nontrivial, especially if LxMxN tensor...

    # loop over a dummy variable (or two) and perform some summation 
    # (example indices shown):
    out.addValue(key) = out.readValue(key) + 
      other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))

  return out

My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:

typedef struct {
  int index[3];
  float value;
} entry_t;

you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need; but you should test the Python implementation in place for performance before worrying about that stuff.

The problem is the mathematical operations, not the data container... I've never heard of algorithms for efficient sparse N-d tensor products. Have a look `scipy.sparse.dok_matrix`. It's what you're describing here, just limited to 2D. It's easy enough to extend it to hold N-D data, but how do you operate on the data?? (That having been said, your answer is entirely reasonable...) — Joe Kington, Oct 11 '11 at 15:52
ah, I've misunderstood? so this question is asking more about the implementation of a scipy compatible matrix multiplication operation? This should be relatively easy to implement, surely, since all you really need for that is an in-loop query for the value at an index, which I've provided. I'll take a look at the scipy specifications though. — tehwalrus, Oct 11 '11 at 15:58
Well, arguably, I've misunderstood as well. Either way, my point was that you're not taking advantage of the sparsity structure of when doing operations. What you've described in your edit treats it like a dense array. (Which certainly works! Your answer solves the problem at hand.) Sparse matrix libraries take advantage of the sparse-ness of the array, and avoid things like looping over every element of the array, regardless of "sparsity". That's the main point of using a sparse matrix. Operations roughly scale with the number of "dense" elements, not the overall dimensions of the matrix. — Joe Kington, Oct 11 '11 at 19:58
@tehwalrus Thanks for the reply. But I am afraid the multiplication with your suggested data structure may not be very efficient... — zhongqi, Oct 12 '11 at 11:34
@JoeKington You'd have to loop over every element in the non-sparse array (`u` in this case) anyway, surely? Unless both are sparse, in which case I've misunderstood even more. In that case, you can simply loop over the keys in the dictionary, and extract the indices from the tuple. Anyway, I'm not up to speed on sparse algebra, let alone the computer science behind optimising algorithms on the topic. Sorry, @zhongqi ! — tehwalrus, Oct 12 '11 at 15:14
"Unless both are sparse" <-- That's the idea! :) Sorry I wasn't clearer. I was referring to operations between two sparse arrays. (Which is usually the case, when you're dealing with a situation where using sparse matrices is a good solution.) Multiplications between two sparse arrays is the most common use case for a sparse array library, so there's a lot of effort put into optimizing that particular situation. Implementing an efficient sparse tensordot (i.e. multiplication for N-D) is nontrivial. Again, though, your solution is fine. (+1) It's impractical for large, sparse arrays, though — Joe Kington, Oct 12 '11 at 15:26
@JoeKington edit 2 may help, but I'm not sure. At some point, you'll have to deal with the data that is there, though, which means looping over your dummy indices... — tehwalrus, Oct 12 '11 at 15:29

TomCho · Answer 2 · 2021-02-15T14:25:32.953

13

An alternative answer as of 2017 is the sparse package. According to the package itself it implements sparse multidimensional arrays on top of NumPy and scipy.sparse by generalizing the scipy.sparse.coo_matrix layout.

Here's an example taken from the docs:

import numpy as np
n = 1000
ndims = 4
nnz = 1000000
coords = np.random.randint(0, n - 1, size=(ndims, nnz))
data = np.random.random(nnz)

import sparse
x = sparse.COO(coords, data, shape=((n,) * ndims))
x
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>

x.nbytes
# 16000000

y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))

y
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>

edited Feb 15 '21 at 14:25

answered Dec 09 '17 at 17:44

TomCho

3,204
6
32
83

@JayShin Maybe, but to be honest, I think you'd have to test it out. – TomCho Feb 23 '18 at 17:02
1

I recommend writing "as of 2017" instead of "as of this year" – MrMartin Aug 22 '18 at 13:12
3

Any reason for pointing to this fork and not to the original (https://github.com/pydata/sparse source of the fork) one? The one linked here isn't updated since 2018 while the the one I've linked is recently updated (4th Jan 2021). – 0xc0de Feb 14 '21 at 05:38
1

@0xc0de The reason is that I wasn't paying enough attention! I fixed it with the original link. Thanks – TomCho Feb 15 '21 at 14:26

score 7 · Answer 3 · edited Feb 07 '17 at 21:25

7

Have a look at sparray - sparse n-dimensional arrays in Python (by Jan Erik Solem). Also available on github.

edited Feb 07 '17 at 21:25

AnnanFay

9,573
15
63
86

answered Sep 20 '12 at 21:22

eldad-a

3,051
3
22
25

Samufi · Answer 4 · 2016-05-08T02:11:55.270

Nicer than writing everything new from scratch may be to use scipy's sparse module as far as possible. This may lead to (much) better performance. I had a somewhat similar problem, but I only had to access the data efficiently, not perform any operations on them. Furthermore, my data were only sparse in two out of three dimensions.

I have written a class that solves my problem and could (as far as I think) easily be extended to satisfiy the OP's needs. It may still hold some potential for improvement, though.

import scipy.sparse as sp
import numpy as np

class Sparse3D():
    """
    Class to store and access 3 dimensional sparse matrices efficiently
    """
    def __init__(self, *sparseMatrices):
        """
        Constructor
        Takes a stack of sparse 2D matrices with the same dimensions
        """
        self.data = sp.vstack(sparseMatrices, "dok")
        self.shape = (len(sparseMatrices), *sparseMatrices[0].shape)
        self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1])
        self._dim1 = np.arange(self.shape[0])
        self._dim2 = np.arange(self.shape[1])

    def __getitem__(self, pos):
        if not type(pos) == tuple:
            if not hasattr(pos, "__iter__") and not type(pos) == slice: 
                return self.data[self._dim1_jump[pos] + self._dim2]
            else:
                return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos]))
        elif len(pos) > 3:
            raise IndexError("too many indices for array")
        else:
            if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or
                not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice):
                if len(pos) == 2:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]]
                else:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T
                    if hasattr(pos[2], "__iter__") or type(pos[2]) == slice:
                        result = result.T
                return result
            else:
                if len(pos) == 2:
                    return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]]))
                else:
                    if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice:
                        return sp.vstack([self[self._dim1[pos[0]], i, pos[2]]
                                          for i in self._dim2[pos[1]]]).T
                    else:
                        return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]] 
                                          for i in self._dim1[pos[0]]))

    def toarray(self):
        return np.array([self[i].toarray() for i in range(self.shape[0])])

I'm in the same situation and this is very useful. I think with a little extra work this could be implemented into `scipy`'s sparse array module. Have you considered it? — TomCho, Dec 09 '17 at 17:32
@TomCho Thanks! I have not considered implementing this into scipy's sparse module. I think an implementation in scipy should support all the standard numpy matrix features. That would be doable but require a decent chunk of work. Also, I think adding C implementations for operations on these matrices would be much more efficient and more appropriate for scipy. — Samufi, Dec 11 '17 at 09:16

Prof Huster · Answer 5 · 2019-12-12T13:42:08.013

I also need 3D sparse matrix for solving the 2D heat equations (2 spatial dimensions are dense, but the time dimension is diagonal plus and minus one offdiagonal.) I found this link to guide me. The trick is to create an array Number that maps the 2D sparse matrix to a 1D linear vector. Then build the 2D matrix by building a list of data and indices. Later the Number matrix is used to arrange the answer back to a 2D array.

[edit] It occurred to me after my initial post, this could be handled better by using the .reshape(-1) method. After research, the reshape method is better than flatten because it returns a new view into the original array, but flatten copies the array. The code uses the original Number array. I will try to update later.[end edit]

I test it by creating a 1D random vector and solving for a second vector. Then multiply it by the sparse 2D matrix and I get the same result.

Note: I repeat this many times in a loop with exactly the same matrix M, so you might think it would be more efficient to solve for inverse(M). But the inverse of M is not sparse, so I think (but have not tested) using spsolve is a better solution. "Best" probably depends on how large the matrix is you are using.

#!/usr/bin/env python3
# testSparse.py
# profhuster

import numpy as np
import scipy.sparse as sM
import scipy.sparse.linalg as spLA
from array import array
from numpy.random import rand, seed
seed(101520)

nX = 4
nY = 3
r = 0.1

def loadSpNodes(nX, nY, r):
    # Matrix to map 2D array of nodes to 1D array
    Number = np.zeros((nY, nX), dtype=int)

    # Map each element of the 2D array to a 1D array
    iM = 0
    for i in range(nX):
        for j in range(nY):
            Number[j, i] = iM
            iM += 1
    print(f"Number = \n{Number}")

    # Now create a sparse matrix of the "stencil"
    diagVal = 1 + 4 * r
    offVal = -r
    d_list = array('f')
    i_list = array('i')
    j_list = array('i')
    # Loop over the 2D nodes matrix
    for i in range(nX):
        for j in range(nY):
            # Recall the 1D number
            iSparse = Number[j, i]
            # populate the diagonal
            d_list.append(diagVal)
            i_list.append(iSparse)
            j_list.append(iSparse)
            # Now, for each rectangular neighbor, add the 
            # off-diagonal entries
            # Use a try-except, so boundry nodes work
            for (jj,ii) in ((j+1,i),(j-1,i),(j,i+1),(j,i-1)):
                try:
                    iNeigh = Number[jj, ii]
                    if jj >= 0 and ii >=0:
                        d_list.append(offVal)
                        i_list.append(iSparse)
                        j_list.append(iNeigh)
                except IndexError:
                    pass
    spNodes = sM.coo_matrix((d_list, (i_list, j_list)), shape=(nX*nY,nX*nY))
    return spNodes


MySpNodes = loadSpNodes(nX, nY, r)
print(f"Sparse Nodes = \n{MySpNodes.toarray()}")
b = rand(nX*nY)
print(f"b=\n{b}")
x = spLA.spsolve(MySpNodes.tocsr(), b)
print(f"x=\n{x}")
print(f"Multiply back together=\n{x * MySpNodes}")

Sujit Vasanth · Answer 6 · 2020-11-15T11:45:34.010

I needed a 3d look up table for x,y,z and came up with this solution..
Why not use one of the dimensions to be a divisor of the third dimension? ie. use x and 'yz' as the matrix dimensions

eg. if x has 80 potential members, y has 100 potential' and z has 20 potential' you make the sparse matrix to be 80 by 2000 (i.e. xy=100x20)
x dimension is as usual
yz dimension: the first 100 elements will represent z=0, y=0 to 99
..............the second 100 will represent z=2, y=0 to 99 etc
so given element located at (x,y,z) would be in sparse matrix at (x, z*100 + y)
if you need to use negative numbers design a aritrary offset into your matrix translation. the solutio could be expanded to n dimensions if necessary

from scipy import sparse
m = sparse.lil_matrix((100,2000), dtype=float)

def add_element((x,y,z), element):
    element=float(element)
    m[x,y+z*100]=element

def get_element(x,y,z):
    return m[x,y+z*100]

add_element([3,2,4],2.2)
add_element([20,15,7], 1.2)
print get_element(0,0,0)
print get_element(3,2,4)
print get_element(20,15,7)
print "  This is m sparse:";print m

====================
OUTPUT:
0.0
2.2
1.2
  This is m sparse:
  (3, 402L) 2.2
  (20, 715L)    1.2
====================

sparse 3d matrix/array in Python?

6 Answers6

Linked