fast python matrix creation and iteration

Question

I need to create a matrix starting from the values of a weight matrix. Which is the best structure to hold the matrix in term of speed both when creating and iterating over it? I was thinking about a list of lists or a numpy 2D array but they both seem slow to me. What I need:

numpy array
A = np.zeros((dim, dim))
for r in range(A.shape[0]):
    for c in range(A.shape[0]):
        if(r==c):
            A.itemset(node_degree[r])
        else:
            A.itemset(arc_weight[r,c])

or

list of lists
l = []
for r in range(dim):
    l.append([])
    for c in range(dim):
        if(i==j):
            l[i].append(node_degree[r])
        else:
            l[i].append(arc_weight[r,c])

where dim can be also 20000 , node_degree is a vector and arc_weight is another matrix. I wrote it in c++, it takes less less than 0.5 seconds while the others two in python more than 20 seconds. I know python is not c++ but I need to be as fast as possible. Thank you all.

Would a sparse matrix work in your case (i.e. mostly zeros)? — Adam, May 29 '14 at 17:20
Try Cython. It should be able to reach the C speed, while still having the Python interface. — Yuxiang Wang, May 29 '14 at 17:23
You could also consider using a dict, where the key is the row/col. That would give you fast lookups, although I don't think it would save you anything on creation. — Adam, May 29 '14 at 17:24
You shouldn't be appending to the list if you already know it's size. Preallocate the memory first using ' l = [[0 for x in xrange(m)] for x in xrange(n)] '. See http://stackoverflow.com/a/6667288/1290264. — bcorso, May 29 '14 at 17:37

score 4 · Accepted Answer · edited May 23 '17 at 12:11

4

One thing is you shouldn't be appending to the list if you already know it's size.

Preallocate the memory first using list comprehension and generate the r, c values using xrange() instead of range() since you are using Python < 3.x (see here):

l = [[0 for c in xrange(dim)] for r in xrange(dim)]

Better yet, you can build what you need in one shot using:

l = [[node_degree[r] if r == c else arc_weight[r,c] 
            for c in xrange(dim)] for r in xrange(dim)]

Compared to your original implementation, this should use less memory (because of the xrange() generators), and less time because you remove the need to reallocating memory by specifying the dimensions up front.

edited May 23 '17 at 12:11

Community

1
1

answered May 29 '14 at 17:47

bcorso

45,608
10
63
75

Using `range` or `xrange` is a minor issue here. Time spent creating lists of `n` elements is negligible when compared to pushing the data around for the `n^2` matrix elements. With large sizes, any solution that involve dynamic lists or loops (whether explicit or in comprehensions) is a bad idea. This is precisely the problem that `numpy` was created to solve. – Javier May 30 '14 at 10:37
@Javier, thanks for the comment! My answer addresses time/memory improvements using native python. I do not think `xrange` will solve the OPs problem -- I'm simply offering the most efficient pythonic way of implementing the OPs code in native python. Clearly, if the OP still has issues, a non-native solution such as `numpy` is necessary. – bcorso May 30 '14 at 16:58
I understand your point of view, and certainly your solution is faster than the double loop, but since the question specifically mentions the dimensions of the problem (2000x2000) I think a good answer must address the specific issues that arise with large data structures (excessive type checking, need of contiguous memory blocks for fast iteration). If for some reason one wanted to use only the standard library, the right data structure to use would be an `array`, never a list, but since the OP already used `numpy` in the question I see no reason not to use it. – Javier May 31 '14 at 08:23
Your solution did a great job covering those topics, why would I reproduce them in my solution? I posted this answer to address a topic that was not addressed in your answer, i.e., the OPs attempt at using a list[list] structure. If the OP isn't using it the most efficient way it's hard to say if numpy is needed or if it will just be preoptimization. Anyway, I think your answer is good so I will not duplicate it my answer. – bcorso May 31 '14 at 19:10

Javier · Answer 2 · 2014-05-31T09:17:17.267

Numpy matrices are generally faster as they know their dimensions and entry type.

In your particular situation, since you already have the arc_weight and node_degree matrices created so you can create your matrix directly from arc_weight and then replace the diagonal:

A = np.matrix(arc_matrix)
np.fill_diagonal(A, node_degree)

Another option is to replace the double loop with a function that puts the right element in each position and create a matrix from the function:

def fill_matrix(r, c):
    return arc_weight[r,c] if r != c else node_degree[r]

A = np.fromfunction(fill_matrix, (dim, dim))

As a rule of thumb, with numpy you must avoid loops at all costs. First method should be faster but you should profile both to see what works for you. You should also take into account that you seem to be duplicating your data set in memory, so if it is really huge you might get in trouble. Best idea would be to create your matrix directly avoiding arc_weight and node_degree altogether.

Edit: Some simple time comparisons between list comprehension and numpy matrix creation. Since I don't know how your arc_weight and node_degree are defined, I just made up two random functions. It seems that numpy.fromfunction complains a bit if the function has a conditional on it, so I construct the matrix in two steps.

import numpy as np

def arc_weight(a,b):
    return a+b

def node_degree(a):
    return a*a

def create_as_list(N):
    return [[arc_weight(c,r) if c!=r else node_degree(c) for c in xrange(N)] for r in xrange(N)]

def create_as_numpy(N):
    A = np.fromfunction(arc_weight, (N,N))
    np.fill_diagonal(A, node_degree(np.arange(N)))
    return A

And here the timings for N=2000:

time A = create_as_list(2000)
CPU times: user 839 ms, sys: 16.5 ms, total: 856 ms
Wall time: 845 ms

time A = create_as_numpy(2000)
CPU times: user 83.1 ms, sys: 12.9 ms, total: 96 ms
Wall time: 95.3 ms

The second is giving error: fill_matrix() takes exactly 2 arguments (1 given) Could you please correct it? — Nadir, May 29 '14 at 19:46
Sorry, `vectorize` is only for one-parameter function. I have replaced the second version by a working one using `np.fromfunction`. Still, if your matrix is large your biggest issue here is that you seem to be duplicating all your data in memory, avoiding that will be a better improvement than any workaround we post here. — Javier, May 30 '14 at 10:32

score 0 · Answer 3 · answered May 31 '14 at 09:31

Make a copy of arc_weight and fill the diagonal with values from node_degree. For a 20000-by-20000 output, it takes about 1.6 seconds on my machine:

>>> import numpy
>>> dim = 20000
>>> arc_weight = numpy.arange(dim**2).reshape([dim, dim])
>>> node_degree = numpy.arange(dim)
>>> import timeit
>>> timeit.timeit('''
... A = arc_weight.copy()
... A.flat[::dim+1] = node_degree
... ''', '''
... from __main__ import dim, arc_weight, node_degree''',
... number=1)
1.6081738501125764

Once you have your array, try not to iterate over it. Compared to broadcasted operators and NumPy built-in functions, Python-level loops are a performance disaster.

fast python matrix creation and iteration

3 Answers3