3

I'm implementing a RANSAC algorithm for circle detection in images. I profiled the execution and I get:

13699392 function calls in 799.981 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {time.time}
   579810    0.564    0.000    0.564    0.000 {getattr}
   289905    2.343    0.000    8.661    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:226(_get_funcs)
   579810    0.124    0.000    0.124    0.000 {method 'get' of 'dict' objects}
   289905    0.645    0.000    2.676    0.000 {map}
     2954    0.005    0.000    0.005    0.000 {method 'transpose' of 'numpy.ndarray' objects}
     2954    0.023    0.000    0.464    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:179(vstack)
     2954    2.373    0.001    2.373    0.001 {method 'read' of 'cv2.VideoCapture' objects}
   579810    0.966    0.000    2.031    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py:550(asarray_chkfinite)
   289905   10.164    0.000   24.316    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/basic.py:456(lstsq)
     2954    1.090    0.000    1.090    0.000 {normalize}
  1455433    3.827    0.000    3.827    0.000 {numpy.core.multiarray.array}
   579810    2.899    0.000    3.148    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:949(_can_coerce_all)
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     2954   32.544    0.011  795.875    0.269 git/tra-python-processer/tra/ransac.py:31(image_search)
   289905    0.714    0.000   38.644    0.000 git/tra-python-processer/tra/features.py:44(__init__)
   289905    2.157    0.000    2.157    0.000 {method 'randint' of 'mtrand.RandomState' objects}
        1    0.005    0.005    0.005    0.005 {VideoCapture}
   289905    1.026    0.000    1.026    0.000 {method 'astype' of 'numpy.generic' objects}
     2954    0.006    0.000    0.010    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:495(transpose)
   289905   11.303    0.000   37.930    0.000 git/tra-python-processer/tra/features.py:48(__gen)
  3496584    0.343    0.000    0.343    0.000 {len}
     2954    0.344    0.000    0.344    0.000 {numpy.core.multiarray.concatenate}
     2954    3.214    0.001    3.214    0.001 {numpy.core.multiarray.where}
   869715    0.575    0.000    0.575    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2514(size)
   869715    0.778    0.000    2.278    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:394(asarray)
   289905  716.946    0.002  716.946    0.002 git/tra-python-processer/tra/features.py:89(points_distance)
     5908    0.015    0.000    0.031    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:464(asanyarray)
   289905    0.275    0.000    0.275    0.000 {isinstance}
   289905    0.342    0.000    9.003    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/lapack.py:255(get_lapack_funcs)
     5908    0.058    0.000    0.097    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:60(atleast_2d)
   295813    0.089    0.000    0.089    0.000 {method 'append' of 'list' objects}
   289905    0.645    0.000    3.793    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:970(find_common_type)
     2954    0.221    0.000    0.221    0.000 {threshold}
        1    0.000    0.000    0.000    0.000 {method 'get' of 'cv2.VideoCapture' objects}
        1    0.000    0.000    0.000    0.000 git/tra-python-processer/tra/ransac.py:24(__init__)
     2954    0.009    0.000    0.009    0.000 {numpy.core.multiarray.zeros}
   579810    0.143    0.000    0.143    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/misc.py:126(_datacopied)
        1    0.201    0.201  799.981  799.981 git/tra-python-processer/tra/ransac.py:122(video_processing)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     2954    1.528    0.001    1.528    0.001 {cvtColor}
   289905    1.280    0.000    5.346    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:182(find_best_blas_type)
   289905    0.198    0.000    0.198    0.000 {method 'index' of 'list' objects}

It's first time I use profiler, however for what I can understand the function that is most heavy is features.py:89(points_distance) that comes out to be a very easy implementation:

def points_distance(self,points):
    d = n.abs(\
              n.sqrt(\
                     n.power(self.xc - points[:,0],2) + n.power(self.yc - points[:,1],2)
                     )\
              - self.radius
              )
    return d

Any suggestions? Maybe cython?

rdbisme
  • 850
  • 1
  • 13
  • 39
  • Sometimes you can get away with using squared distances everywhere, the square root is the only non-trivial thing that you are doing in this function. – filmor Jun 10 '15 at 08:25
  • @filmor I also thought that removing `n.sqrt` would have dropped the time, but in reality I get a 3% of improvement with `d = n.abs(n.power(self.xc - points[:,0],2) + n.power(self.yc - points[:,1],2)- self.radius*self.radius)` implementation – rdbisme Jun 10 '15 at 09:20
  • You can also get rid of the `n.abs`, `sum` is fine since your squares are positive. Is squaring optimised in numpy? – filmor Jun 10 '15 at 09:32

1 Answers1

2

Use scipy.spatial.distance.cdist for the distance calculation in points_distance.

First, optimize your code in pure Python and numpy. Then if necessary port the critical parts to Cython. Since a number of functions are called repeatedly a few ~100000 times, you should get some speed up from Cython for those parts. Unless, of course, the computational bottleneck is in the distance calculation, which will then limit the overall execution time.

By the way, you should sort your profiler results by tottime so they are easier to read.

rth
  • 10,680
  • 7
  • 53
  • 77
  • 1
    This leads to a ~60% of improvement. Now the `tottime` of the `points_distance` function [is in the order of magnitude](http://pastebin.com/LYvsdBS1) of the other methods. – rdbisme Jun 10 '15 at 09:32