Optimizing iteration using itertools.izip

Question

I have an algorithm that iterates through all of the nonzero values of a matrix which looks something like this:

for row, col, val in itertools.izip(matrix.row, matrix.col, matrix.data):
    dostuff(row, col, val)

I realize that this is the fastest way of iterating through a sparse matrix in numpy, as was discussed in Iterating through a scipy.sparse vector (or matrix).

My problem is that the function I perform at each evaluation takes another vector, let's call it vec, and does nothing if vec[row] is equal to 0, which in some cases is true for the majority of rows.

Therefore, I do not want to iterate through all the nonzero triplets (row, col, val) in the matrix for which vec[row] != 0.

What I currently do is the simple and stupid solution of

import numpy as np
import scipy.sparse as sp
import itertools

N = 10000
matrix = sp.rand(N, N, density=0.0001, format='coo', dtype=None, random_state=None)
vec = np.zeroes(N)
s = 0
for row, col, val in itertools.izip(matrix.row, matrix.col, matrix.data):
    if vec[row] != 0:
        s += vec[row] * val # in reality, some other function is here

which works and works quicker than the original code if there are not many rows with vec[row]!=0. However, the code runs slowly in the case when all values of vec are nonzero, and that is a case I am not allowed to ignore (for example, if vec=np.ones(len(matrix.data)).

Therefore, I need some sort of extension of izip which will allow me to "conditionally" iterate through its output, so that I would write something like

for row, col, val in itertools.izip(matrix.row, matrix.col, matrix.data, lambda x: vec[x[0]] !> 0):
    dostuff(row, col, val)

What are your suggestions? What will be the quickest way to do this?

Well, you're still missing an import. But close enough; I was able to test my ideas and get them working and that's the important part. — Veedrac, Oct 02 '14 at 10:25

score 2 · Accepted Answer · answered Oct 02 '14 at 10:24

2

You can just use Numpy's special indexing on the rows, columns and data:

which = vec[matrix.row] != 0

rows = matrix.row[which]
cols = matrix.col[which]
data = matrix.data[which]

for row, col, val in itertools.izip(rows, cols, data):
    s += vec[row] * val

answered Oct 02 '14 at 10:24

Veedrac

58,273
15
112
169

That looks wonderful, thank you. I will try it out to see how much faster my algorithm becomes. If you don't mind, I will leave the question open for a while to see if some other suggestions come in, otherwise I will accept this question. – 5xum Oct 02 '14 at 10:27
Whis is an amazing solution. The original code now still runs about 20 percent faster than the modified code if `vec` has no nonzero elements, but if `vec` is sparse, this code quicker by 2 to 3 orders of magnitude. Thank you. – 5xum Oct 02 '14 at 10:43

Optimizing iteration using itertools.izip

1 Answers1