Python Code Optimization LineProfiler

Question

My function gets a combinations of numbers, can be small or large depending on user input, and will loop through each combinations to perform some operations.

Below is a line-profiling I ran on my function and it is takes 0.336 seconds to run. While this is fast, this is only a subset of a bigger framework. In this bigger framework, I will need to run this function 50-20000 times which when multiplied to 0.336, takes 16.8 to 6720 seconds (I hope this is right). Before it takes 0.996 seconds but I've manage to cut it by half through avoiding functions calls.

enter image description here

The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations. My dictionary is a collection data and it looks something like this:

dic = {"array1", a,
       "array2", b,
       "array3", c,
       "listofarray", [ [list of 5 array], [list of 5 array], [list of 5 2d Array ] ] 
      }

I was able to cut it by another ~0.01 seconds when I placed the dictionary lookback outside of the loop..

x = dic['listofarray']['list of 5 2d array']

So when I loop to get access to the the 5 different elements I just did x[i].

Other than that I am lost in terms of where to add more performance boost.

Note: I apologize that I haven't provided any code. I'd love to show but its proprietary. I just wanted to get some thoughts on whether I am looking at the right place for speed ups.

I am willing to learn and apply new things so if cython or some other data structure can speed things up, i am all ears. Thanks so much

PS:

inside my first __getitem__:

enter image description here

inside my second __getitem__:

enter image description here

EDIT:

I am using iter tools product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).

If that's actually your `dic`, it's not a dictionary, it's a `set`, and you can't be calling `__getitem__` on it. — abarnert, Dec 02 '13 at 23:06
So your question is "how do I speed up this code that I can't show you?" Not a very easy one to answer. Just remember that premature optimization is the root of all evil. Is this code a bottleneck yet? If not, leave it the hell alone! — John Lyon, Dec 02 '13 at 23:06
Meanwhile, the two `__getitem__` profiles you posted are very definitely not `dict.__getitem__`, but NumPy/Pandas `__getitem__` functions, so apparently you're looking in the wrong place anyway. — abarnert, Dec 02 '13 at 23:07
Apologies again for not showing code. I've made a minor update to explain my main loop which loops over the entire combo iterator. — user1234440, Dec 02 '13 at 23:15
You've now shown us a tiny bit of code, and explained a tiny bit more, all of which is almost certainly code that isn't relevant to the problem. That's not significantly better than not showing us anything. — abarnert, Dec 02 '13 at 23:18

abarnert · Accepted Answer · 2013-12-02T23:31:30.903

The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations.

No it isn't. Your two posted profile traces clearly show that they're NumPy/Pandas __getitem__ functions, not dict.__getitem__. So, you're trying to optimize the wrong place.

Which explains why moving all the dict stuff out of the loop made a difference of a small fraction of a percent.

Most likely the problem is that you're looping over some NumPy object, or using some fake-vectorized function (e.g., via vectorize), rather than performing some NumPy-optimized broadcasting operation. That's what you need to fix.

For example, if you compare these:

np.vectorize(lambda x: x*2)(a)
a * 2

… the second one will go at least 10x faster on any sizable array, and it's mostly because of all the time spending doing __getitem__—which includes boxing up numbers to be usable by your Python function. (There's also some additional cost in not being able to use CPU-vectorized operations, cacheable tight loops, etc., but even if you arrange things to be complicated enough that those don't enter into it, you're still going to get much faster code.)

Meanwhile:

I am using itertools.product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).

So you're creating 10**n separate n-element arrays? That's not making sensible use of NumPy. Each array is tiny, and most likely you're spending as much time building and pulling apart the arrays as you are doing actual work. Do you have the memory to build a single giant array with an extra 10**n-long axis instead? Or, maybe, batch it up into groups of, say, 100K? Because then you could actually build and process the whole array in native NumPy-vectorized code.

However, the first thing you might want to try is to just run your code in PyPy instead of CPython. Some NumPy code doesn't work right with PyPy/NumPyPy, but there's fewer problems with each version, so you should definitely try it.

If you're lucky (and there's a pretty good chance of that), PyPy will JIT the repeated __getitem__ calls inside the loop, and make it much faster, with no change in your code.

If that helps (or if NumPyPy won't work on your code), Cython may be able to help more. But only if you do all the appropriate static type declarations, etc. And often, PyPy already helps enough that you're done.

So are you saying I should convert the iterator to a ndarray and perform analysis on that instead? — user1234440, Dec 02 '13 at 23:28
@user1234440: Maybe. Or, rather, replace the iterator with NumPy code that generates the same values it would have. See [this question](http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays) for a sample of doing that with a different but similar problem, which runs about 5x faster in pure NumPy than in itertools+NumPy (in CPython). — abarnert, Dec 02 '13 at 23:31

Python Code Optimization LineProfiler

1 Answers1