6

I developed an application which simulates N robots moving in a grid which try to maximize the amount of visited grid cells in a limited amount of steps, meeting in a goal point. It all works correctly, but is horrible slow. It is currently python+numpy+mathplotlib.

Maximum robots there can be have a soft limit of 100 (if it can get higher, it is nice to have).

To do that, I do the following, simplified:

while steps > 0:
    for robot in robots:
        agent.calc(robot,steps)

A robot is a 1x2 numpy array (x-and-y-coordinates).

The agent here decides what to do. Since I need to switch the tactic and strategy on the fly, I cannot move that logic.

agent.calc updates a robot in place, one after another.

cProfiling it returns the following. Extracting the top

         39014272 function calls (39010490 primitive calls) in 150.314 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 12417735   62.807    0.000   62.807    0.000 distance.py:8(taxicab_distance)
   124596   36.882    0.000   36.882    0.000 {numpy.core.multiarray.array}
   113657   30.204    0.000  100.800    0.001 logical_agent.py:16(choose_max_distance_to...)
 12417013    6.579    0.000   69.384    0.000 squaregrid.py:30(distance)
   113700    2.900    0.000  109.769    0.001 logical_agent.py:73(calc)
 11652363    2.625    0.000    2.625    0.000 {method 'append' of 'list' objects}
   161849    1.653    0.000    1.653    0.000 distance.py:11(euclidean_distance)
   113664    1.632    0.000    1.632    0.000 {sorted}
   114834    1.185    0.000    1.185    0.000 {method 'keys' of 'dict' objects}
   113700    0.695    0.000    1.134    0.000 squaregrid.py:19(neighbours)

I implemented different environments for the robots, the most important is squaregird. Every environment has its own distance function, since I intended to use different metrics, i.e. Manhattan/taxicab and euclidean. I extracted the distance function into an own distance.py file, since I use it in several occasions.

One can see that taxicab_distance is called alot, since the agent needs to evaluate the distances of a robots four neighbours and itself to a goal point to see whether the next position can still reach the goal and to maximize the distance to all other robots as a optimizing heuristics.

The function does not do anything fancy, just

def taxicab_distance(u, v):
    return np.abs(u[0] - v[0]) + np.abs(u[1] - v[1])

I know that python has a pretty high function call overhead, and I assume that that hits the performance. The {numpy.core.multiarray.array} can be ignored, I think I know what I am doing wrong there.

Distance call chain: agent -> environment.distance -> taxicab_distance

The question is, how can I reduce the overhead of calling the function? I strongly considered using pythons c extensibility, cython, to be more concrete. Will it work? can there be another reason why it is so slow?

jcklie
  • 4,054
  • 3
  • 24
  • 42
  • I doing very much that the overhead of *calling* the function is significant, seeing as it is almost certainly dominated by the actual work done by the function. – Daniel Roseman Jan 01 '14 at 15:06
  • 7
    As an experiment, temporarily stop using `taxicab_distance`. Instead, execute function's code directly: `np.abs(u[0] - v[0]) + np.abs(u[1] - v[1])`. Does it help the performance in an appreciable way? My guess is that you should be hunting big (a better algorithm, caching strategies, etc), not small (micro-optimizations like function call overhead). – FMc Jan 01 '14 at 15:11
  • 1
    Are you taking advantage of vectorized operations? It doesn't look like it. If you don't, using NumPy will actually be slower than using regular lists. – user2357112 Jan 01 '14 at 15:15
  • 1
    `method 'keys' of 'dict' objects` - this is unlikely to have a major impact on your runtime, but why are you calling that method? Most of the time, it's unnecessary. If you want to iterate over a dict's keys, just use `for key in d`. – user2357112 Jan 01 '14 at 15:17
  • @user2357112 I dont need to iterate over the dict keys, I store them to animate them later. For every step, I snapshot the grid cells and robot positions for matplotlib animate. – jcklie Jan 01 '14 at 15:45
  • I inlined the function, and it bought me 20 seconds. But you are totally right, since it takes now 100s with inlined, and 120s without. Another algorithm must be thought of. – jcklie Jan 01 '14 at 16:07
  • If `u[0]` is a scalar, try `abs(u[0] - v[0]) + abs(u[1] - v[1])`. This (may) avoid unnecessary conversion to arrays. – hpaulj Jan 01 '14 at 18:15
  • @DanielRoseman CPython's function calling mechanism (the function Py_Call in C) is exceptionally expensive: https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation – kfsone Jan 24 '23 at 19:52

2 Answers2

5

First, I'd rewrite it into:

def taxicab_distance(u, v):
     return np.sum(np.abs(u - v))

Can you compute taxicab_distance for many robots at once?

Yariv
  • 12,945
  • 19
  • 54
  • 75
  • 1
    exactly. to take advantage of the performance benefits of numpy, you need to vectorize your code; which lifts the inner loops from python to the C-level. – Eelco Hoogendoorn Jan 01 '14 at 16:49
  • How would I vectorize my **agent.calc(robot,steps)** ? The underlying data structure is an array of robots(2d-points), and I potentially could return the value of the call instead of updating in place. – jcklie Jan 01 '14 at 17:21
  • You need to show us what agent.calc does. I suggest creating a question a la "How to vectorize ..." – Yariv Jan 01 '14 at 19:31
3

I benchmarked it with inlining, and it took off ~15sec. In the end, I rewrote the number crunching in C++ and used cython for integrating. After that, it took only 1 second.

EDIT: cpython -> cython

jcklie
  • 4,054
  • 3
  • 24
  • 42
  • do you really mean 'cpython', or something else? – BBedit Mar 31 '15 at 17:25
  • I meant cython to call C++ from Python. – jcklie Mar 31 '15 at 21:47
  • You gave yourself an XY problem; numpy is designed to work with aggregate data because of the exceptionally high overhead python has for making function calls, see https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy/47542304#47542304. You could have used `np.sum(np.abs(np.subtract(robot, neighbors)), axis=1)`, and perhaps 'neighbors' would be the list of all robots, for cache coherence. – kfsone Jan 24 '23 at 21:02