What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to SPEED?

Question

This is a follow up to this question

What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?

I'm interested in understanding the speed implications of using a numpy array vs a list of lists when the array is of type object.

If anyone is interested in the object I'm using:

import gmpy2 as gm
gm.mpfr('0') # <-- this is the object

abarnert · Accepted Answer · 2014-11-05T22:32:44.927

2

The biggest usual benefits of numpy, as far as speed goes, come from being able to vectorize operations, which means you replace a Python loop around a Python function call with a C loop around some inlined C (or even custom SIMD assembly) code. There are probably no built-in vectorized operations for arrays of mpfr objects, so that main benefit vanishes.

However, there are some place you'll still benefit:

Some operations that would require a copy in pure Python are essentially free in numpy—transposing a 2D array, slicing a column or a row, even reshaping the dimensions are all done by wrapping a pointer to the same underlying data with different striding information. Since your initial question specifically asked about A.T, yes, this is essentially free.
Many operations can be performed in-place more easily in numpy than in Python, which can save you some more copies.
Even when a copy is needed, it's faster to bulk-copy a big array of memory and then refcount all of the objects than to iterate through nested lists deep-copying them all the way down.
It's a lot easier to write your own custom Cython code to vectorize an arbitrary operation with numpy than with Python.
You can still get some benefit from using np.vectorize around a normal Python function, pretty much on the same order as the benefit you get from a list comprehension over a for statement.
Within certain size ranges, if you're careful to use the appropriate striding, numpy can allow you to optimize cache locality (or VM swapping, at larger sizes) relatively easily, while there's really no way to do that at all with lists of lists. This is much less of a win when you're dealing with an array of pointers to objects that could be scattered all over memory than when dealing with values that can be embedded directly in the array, but it's still something.

As for disadvantages… well, one obvious one is that using numpy restricts you to CPython or sometimes PyPy (hopefully in the future that "sometimes" will become "almost always", but it's not quite there as of 2014); if your code would run faster in Jython or IronPython or non-NumPyPy PyPy, that could be a good reason to stick with lists.

edited Nov 05 '14 at 22:32

answered Nov 05 '14 at 22:23

abarnert

354,177
51
601
671

http://stackoverflow.com/questions/26600471/how-do-i-make-gmpy-array-operations-faster I had asked about for looping over a `gmpy2.mpfr` and it seemed that "vectorizing" didn't improve the performance... Just an fyi I guess – evan54 Nov 05 '14 at 22:25
Why are in place operations easier in numpy? I'm assuming you mean smth like `E[0,0,0] += 1` is easier than `E[0][0][0] += 1` – evan54 Nov 05 '14 at 22:28
@evan54: It probably depends on what you're trying to do. It looks like the `gm.sin` function itself is so slow that the overhead added by Python isn't worth optimizing out. (As the answer says, "Software-based, arbitrary-precision floating point is just slower than native floating point.") But in other cases that might not be true. – abarnert Nov 05 '14 at 22:28
@evan54: Consider how you'd replace (or increment, or whatever) a whole column in-place in a list of lists. – abarnert Nov 05 '14 at 22:29
Yeah I think I get it. In one case you'd do a `E[0,:,0] += 1` instead of a for loop... or something to that effect – evan54 Nov 05 '14 at 22:34
@evan54: Yeah, and the for loop would often be clumsy—to the point where it's easier to rewrite things in terms of a nested copy (especially in the form of a loop comprehension). Which is great when that's actually faster (which is true more often than you'd think), but when it isn't, Python is putting hurdles in the way of your optimization and numpy isn't. – abarnert Nov 05 '14 at 22:42
A numpy array of dtype object is effectively an array of pointers to the objects. Whereas a list of the same objects is an optimized linked list of pointers. Actions like appending, deleting, insertering different size sublists are better done with lists. The arrays are better for multidimensional operations (compared to lists of lists). – hpaulj Nov 06 '14 at 02:50
@hpaulj: Completely wrong. A Python `list` is not a linked list, it's an array. – abarnert Nov 06 '14 at 21:01
OK. But is it still better suited for things like appending, deleting,and inserting? – hpaulj Nov 06 '14 at 22:39
@hpaulj: I guess, in that you _can't_ do those things with a numpy array, because it's fixed-size, while you can with a Python list, because it's variable-length and will shift all the other elements out up or down. But it's not actually _good_ for those operations (notice that shifting all the other elements takes linear time, and invalidates all indexes and iterators). In general, if your key operations are things like splicing sublists into the middle, you're looking at the wrong type. (Of course often, you're dealing with small enough lists that it really doesn't matter.) – abarnert Nov 06 '14 at 23:01

What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to SPEED?

1 Answers1

Linked