1

Consider the following code which is a solution to the "circle stabbing problem" of computational geometry, namely finding the line that intersects a maximal number of non-intersecting circles.

The main question here is why the Python code is 42x slower than the C++ code. Is it because some improper use of Python? Or is it due to Python being inherently this slower than C++ at arithmetic and math operations? Is there anything that can be done to make it a bit faster?

First, the Python code:

from __future__ import division
from math import sqrt, atan2, pi
from sys import stdin

__author__ = "Sahand Saba"


EPS = 1e-6


class Point(object):
    __slots__ = ['x', 'y']

    def __init__(self, xx=0.0, yy=0.0):
        self.x = float(xx)
        self.y = float(yy)

    def angle(self):
        """
        Returns the angle the point makes with the x-axis,
        counter-clockwise. Result is in the [0, 2*pi) range.
        """
        a = atan2(self.y, self.x)
        if a < 0:
            a = 2 * pi + a
        return a


class Circle(object):
    __slots__ = ['center', 'radius']

    def __init__(self, center, radius):
        self.center = center
        self.radius = float(radius)

    def common_tangents(self, other):
        """
        Returns [[p1,p2],[q1,q2]] with p1, p2, q1, q2 all Point objects
        representing points on C1 such that the tangent lines to C1 at p1, p2,
        q1, q2 are tangent to C2 as well. Further more, p1 and p2 represent
        external tangent lines, while q1 and q2 represent internal ones. It is
        also guaranteed that p1 and q1 are both on the left-side of the line
        connecting C1.center to C2.center, and p2 and q2 are on the right-side
        of it.
        """
        C1, C2 = self, other
        mu = C1.center.x - C2.center.x
        eta = C1.center.y - C2.center.y
        r1 = C1.radius
        r2 = C2.radius
        r1r1 = r1 * r1
        r1r2 = r1 * r2
        delta1 = r1r1 - r1r2
        delta2 = r1r1 + r1r2
        mumu = mu * mu
        etaeta = eta * eta
        D = etaeta + mumu
        result = [[], []]
        if abs(D) < EPS:
            return result

        if abs(eta) < EPS:
            # In this case there is symmetry along the x-axis and we can
            # not divide by eta. Use x^2 + y^2 = r^2 to find y.
            dmu = -1 if mu < 0 else 1
            x = (-delta1 * mu) / D
            y = -dmu * sqrt(r1r1 - x * x)
            result[0].append(Point(x, y))
            result[0].append(Point(x, -y))
            x = (-delta2 * mu) / D
            y = -dmu * sqrt(r1r1 - x * x)
            result[1].append(Point(x, y))
            result[1].append(Point(x, -y))
        else:
            # Here, the symmetry is along the line connecting the two centers.
            # Use the equation eta*y + mu *x + r1^2 - r1 * r2 = 0 to derive y
            # since we can divide by eta.
            dd1 = delta1 * delta1
            dd2 = delta2 * delta2
            Delta1 = sqrt(dd1 * mumu - D*(dd1 - etaeta * r1r1))
            Delta2 = sqrt(dd2 * mumu - D*(dd2 - etaeta * r1r1))
            deta = -1 if eta < 0 else 1
            x = (-delta1 * mu + deta * Delta1) / D
            result[0].append(Point(x, -(mu*x + delta1)/eta))
            x = (-delta1 * mu - deta * Delta1) / D
            result[0].append(Point(x, -(mu*x + delta1)/eta))
            x = (-delta2 * mu + deta * Delta2) / D
            result[1].append(Point(x, -(mu*x + delta2)/eta))
            x = (-delta2 * mu - deta * Delta2) / D
            result[1].append(Point(x, -(mu*x + delta2)/eta))

        return result


def add_events(A, p, q):
    start = p.angle()
    end = q.angle()
    A.append((start, 1, p))
    A.append((end, -1, q))
    return 1 if start > end else 0


def max_intersecting_line(C):
    """
    Given a list of circles, returns (m, c, p) where m is the maximal number of
    circles in C any line can intersect, and p is a point on a circle c in C
    such that the tangent line to c at p intersects m circles in C.
    """
    global_max = 1
    solution_circle = C[0]
    solution_point = Point(C[0].radius, 0.0)
    for c1 in C:
        local_max = 1
        A = []

        for c2 in (c for c in C if c is not c1):
            Q = c1.common_tangents(c2)
            t1 = add_events(A, Q[1][0], Q[0][0])
            t2 = add_events(A, Q[0][1], Q[1][1])
            local_max += max(t1, t2)

        if local_max > global_max:
            global_max = local_max
            solution_point = Point(c1.radius, 0.0)
            solution_circle = c1

        A.sort(key=lambda a: a[0])
        for a in A:
            local_max += a[1]
            if local_max > global_max:
                global_max = local_max
                solution_point = Point(c1.center.x + a[2].x,
                                       c1.center.y + a[2].y)
                solution_circle = c1
    return global_max, solution_circle, solution_point


if __name__ == '__main__':
    T = int(stdin.readline())
    for __ in xrange(T):
        n = int(stdin.readline())
        C = []
        for i in xrange(n):
            x, y, r = tuple(stdin.readline().split(' '))
            C.append(Circle(Point(x, y), r))
        print max_intersecting_line(C)[0]

And the almost line-for-line equivalent C++ code:

#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>

using namespace std;

double EPS = 1e-6;

class Point {
    public:
        double x, y;
        Point(double x=0.0, double y=0.0) : x(x), y(y) {}
        double angle() const {
            double a = atan2(y, x);
            if (a < 0) {
                a = atan(1) * 8.0 + a;
            }
            return a;
        }
};

class Event {
    public:
        double angle;
        double count;
        Event(double angle = 0, int count = 1) : angle(angle), count(count) {}
        bool operator<(const Event &o) const {
            return angle < o.angle;
        }
};

struct CircleCircleTangents {
    public:
        Point external[2];
        Point internal[2];
};

class Circle {
    public:
        Point center;
        double radius;
        Circle(double x=0.0, double y=0.0, double r=1.0) : radius(r), center(x,y) {}

        // external[0] and internal[0] are guaranteed to be on the left-side of
        // the directed line contennting C1.center to C2.center
        CircleCircleTangents commonTangents(const Circle& C2) const {
            const Circle& C1 = *this;
            double mu = C1.center.x - C2.center.x;
            double eta = C1.center.y - C2.center.y;
            double r1 = C1.radius;
            double r2 = C2.radius;
            double r1r1 = r1 * r1;
            double r1r2 = r1 * r2;
            double delta1 = r1r1 - r1r2;
            double delta2 = r1r1 + r1r2;
            double D = eta*eta + mu*mu;
            CircleCircleTangents result;
            if (abs(eta) < EPS){
                // Do not divide by eta! Use x^2 + y^2 = r^2 to find y.
                double dmu = mu < 0? -1 : 1;
                double x = (-delta1 * mu) / D;
                double y = -dmu * sqrt(r1r1 - x * x);
                result.external[0].x = x;
                result.external[0].y = y;
                result.external[1].x = x;
                result.external[1].y = -y;
                x = (-delta2 * mu) / D;
                y = -dmu * sqrt(r1r1 - x * x);
                result.internal[0].x = x;
                result.internal[0].y = y;
                result.internal[1].x = x;
                result.internal[1].y = -y;
            } else {
                // Dividing by eta is ok. Use mu*x + eta*y + delta = 0 to find y.
                double mumu = mu * mu;
                double etaeta = eta * eta;
                double dd1 = delta1 * delta1;
                double dd2 = delta2 * delta2;
                double deta = eta < 0? -1 : 1;
                double Delta1 = deta * sqrt(dd1 * mumu - D*(dd1 - etaeta * r1r1));
                double Delta2 = deta * sqrt(dd2 * mumu - D*(dd2 - etaeta * r1r1));
                double x = (-delta1 * mu + Delta1) / D;
                result.external[0].x = x;
                result.external[0].y = -(mu*x + delta1)/eta;
                x = (-delta1 * mu - Delta1) / D;
                result.external[1].x = x;
                result.external[1].y = -(mu*x + delta1)/eta;
                x = (-delta2 * mu + Delta2) / D;
                result.internal[0].x = x;
                result.internal[0].y = -(mu*x + delta2)/eta;
                x = (-delta2 * mu - Delta2) / D;
                result.internal[1].x = x;
                result.internal[1].y = -(mu*x + delta2)/eta;
            }
            return result;
        }
};

bool add_events(vector<Event>& A, const Point& p, const Point& q) {
    double start = p.angle();
    double end = q.angle();
    A.push_back(Event(start, 1));
    A.push_back(Event(end, -1));
    return start > end;
}

// Given a list of circles, returns (m, c, p) where m is the maximal number of
// circles in C any line can intersect, and p is a point on a circle c in C
// such that the tangent line to c at p intersects m circles in C.
int max_intersecting_line(const Circle* C, int n) {
    int global_max = 1;
    vector<Event> A;
    for(int i = 0; i < n; i++) {
        const Circle& c1 = C[i];
        A.clear();
        int local_max = 1;
        for(int j = 0; j < n; j++) {
            if(j == i) continue;
            const Circle& c2 = C[j];
            CircleCircleTangents Q = c1.commonTangents(c2);
            bool t1 = add_events(A, Q.internal[0], Q.external[0]);
            bool t2 = add_events(A, Q.external[1], Q.internal[1]);
            if(t1 || t2) {
                local_max++;
            }
        }

        if (local_max > global_max) {
            global_max = local_max;
        }

        sort(A.begin(), A.end());
        for(int i = 0; i < A.size(); i++) {
            local_max += A[i].count;
            if(local_max > global_max) {
                global_max = local_max;
            }
        }
    }
    return global_max;
}

int main() {
    Circle C[2000];
    int T;
    cin >> T;
    for (int t = 0; t < T; t++) {
        int n;
        cin >> n;
        for (int i = 0; i < n; i++) {
            cin >> C[i].center.x >> C[i].center.y >> C[i].radius;
        }

        cout << max_intersecting_line(C, n) << endl;
    }
    return 0;
}

And their performance difference:

$ time ./janeway < io/Janeway.in > /dev/null

real    0m8.436s
user    0m8.430s
sys 0m0.003s

$ time python janeway.py < io/Janeway.in > /dev/null

real    5m57.899s
user    5m57.217s
sys 0m0.165s

As you can see, the C++ code is about 42 times faster.

(Test input is taken from ACM ICPC regional's problem from 2013. See http://www.acmicpc-pacnw.org/results.htm problem "Janeway" of 2013.)

EDIT: Here is the cProfile output for the Python code:

         799780565 function calls in 507.293 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 __future__.py:48(<module>)
        1    0.000    0.000    0.000    0.000 __future__.py:74(_Feature)
        7    0.000    0.000    0.000    0.000 __future__.py:75(__init__)
        1    0.047    0.047  507.293  507.293 janeway.py:1(<module>)
       25   63.703    2.548  507.207   20.288 janeway.py:103(max_intersecting_line)
        1    0.000    0.000    0.000    0.000 janeway.py:11(Point)
 24250014    5.671    0.000    5.671    0.000 janeway.py:116(<genexpr>)
 96926032    9.343    0.000    9.343    0.000 janeway.py:127(<lambda>)
 96955733   57.902    0.000   57.902    0.000 janeway.py:14(__init__)
 96926032   46.840    0.000   63.156    0.000 janeway.py:18(angle)
        1    0.000    0.000    0.000    0.000 janeway.py:29(Circle)
    18506    0.009    0.000    0.009    0.000 janeway.py:32(__init__)
 24231508  167.128    0.000  245.945    0.000 janeway.py:36(common_tangents)
 48463016   59.402    0.000  129.139    0.000 janeway.py:95(add_events)
 48463016    4.106    0.000    4.106    0.000 {abs}
 96926032   16.315    0.000   16.315    0.000 {math.atan2}
 48463016    4.908    0.000    4.908    0.000 {math.sqrt}
 24231508    9.483    0.000    9.483    0.000 {max}
193870570   18.503    0.000   18.503    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    18532    0.009    0.000    0.009    0.000 {method 'readline' of 'file' objects}
    18506   43.918    0.002   53.261    0.003 {method 'sort' of 'list' objects}
    18506    0.007    0.000    0.007    0.000 {method 'split' of 'str' objects}
Sahand
  • 2,095
  • 1
  • 19
  • 24
  • 3
    Have you profiled it to see where the python code is spending its time? At first glance you're doing some weird things in the python code, like compiling the `add_events` closure each time in your `for c1 in C:` loop. You're certainly not doing that in the "line-for-line equivalent C++ code." – roippi Apr 21 '14 at 19:34
  • `python -m cProfile your_script.py` will show you which calls are taking the time. – aychedee Apr 21 '14 at 19:38
  • @roippi: Good point! I changed the code and moved ``add_events`` to outside the for loop. Also added cProfile output to the question. Have a look at the modified code and the output above. No noticeable performance gain as far as I can tell. – Sahand Apr 21 '14 at 19:53
  • Just as a small note, since most of your time is spent in max_intersecting_line and you spend a lt of time in `sort`, try replacing `A.sort(key=lambda a: a[0])` with `A.sort(key=operator.itemgetter(0))` – dawg Apr 21 '14 at 19:59
  • a good introduction: [How to optimize for speed](http://scikit-learn.org/dev/developers/performance.html) (the advice is applicable also outside of scikit-learn package). – jfs Apr 21 '14 at 20:04
  • @dawg: I don't disagree with the recommendation to use `itemgetter` instead of `lambda` where practical; but for this particular case, why is the `key` parameter needed at all? Isn't sorting by the first (sub)element the natural sort order anyway? – John Y Apr 21 '14 at 20:13
  • Yes -- agreed -- but I there *may* be a reason to only use the first element since Python's stable sort leaves the unsorted elements in the same order. I did not look in detail enough to see. Looking at the C++ code, you would appear to be on to an even better suggestion. – dawg Apr 21 '14 at 20:17
  • @dawg: Good suggestion. The `key` is not needed at all, and removing it results in better, more readable code, and likely at least a bit faster too. However, it resulted in no noticeable performance gain overall in this case. – Sahand Apr 21 '14 at 20:53

3 Answers3

1

Python is an interpreted language, C++ is compiled. In general, for every arithmetic expression such as "1 + 2", three objects are created on the heap, one for the number "1", one for the number "2" and another for the result, "3". In C++ it all boils down to much simpler assembler operations after it is compiled. Hence, for most numerical code such large discrepancy in performance is to be expected.

In certain situations you can make things much faster by using numpy arrays, and numpy expressions. For more details see: http://wiki.scipy.org/PerformancePython

Tiago Peixoto
  • 5,149
  • 2
  • 28
  • 28
  • Saying that Python is interpreted isn't really quite accurate: http://stackoverflow.com/questions/745743/is-python-interpreted-like-javascript-or-php, http://stackoverflow.com/questions/6889747/is-python-interpreted-or-compiled-or-both?lq=1 – Winston Ewert Apr 21 '14 at 19:57
1

In C++, a compiler can often turn an arithmetic operation into a single processor instruction.

In Python you face a double whammy. The code is interpreted which introduces overhead on every operation. Not only that but Python can't assume the objects are numbers, it must inspect them to even know which operation to perform. For example you can use + on two numbers to add them, or you can use + on two strings to concatenate them. Python doesn't know ahead of time if the variables are numbers or strings.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
0

Python - is an interpreted language, while c++ - compiled. To accelerate the Python code you can try PyPy, Cython or Shedskin.

NorthCat
  • 9,643
  • 16
  • 47
  • 50