3

This code performs Haversine distance calculations and is part of a larger project.

The Java implementation seems to be 60x faster than Python. They have nearly identical implementations.

This performance is on the same machine and OS. I have tried various combinations:

  1. OS : Linux and Windows
  2. Machine with different CPUs (i5 from 4th and 6th gen)

but the difference in execution speed is consistent.

Python implementation:

from math import radians, sin, cos, sqrt, asin
import time 

def haversine(lat1, lon1, lat2, lon2):

  R = 6372.8 # Earth radius in kilometers

  dLat = radians(lat2 - lat1)
  dLon = radians(lon2 - lon1)
  lat1 = radians(lat1)
  lat2 = radians(lat2)

  a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
  c = 2*asin(sqrt(a))

  return R * c

start_time = time.time()
#START: Performance critical part
for x in range(1, 1*1000*1000*10): 
    haversine(36.12, -86.67, 33.94, -118.40)
#END: Performance critical part
elapsed_time = time.time() - start_time
print("Python elapsed time = " + str(elapsed_time)+" seconds")

Java implementation:

import java.util.Date;

public class HaversineTest {
    public static final double R = 6372.8; // In kilometers
    public static double haversine(double lat1, double lon1, double lat2, double lon2) {
        double dLat = Math.toRadians(lat2 - lat1);
        double dLon = Math.toRadians(lon2 - lon1);
        lat1 = Math.toRadians(lat1);
        lat2 = Math.toRadians(lat2);

        double a = Math.pow(Math.sin(dLat / 2),2) + Math.pow(Math.sin(dLon / 2),2) * Math.cos(lat1) * Math.cos(lat2);
        double c = 2 * Math.asin(Math.sqrt(a));
        return R * c;
    }
    public static void main(String[] args) {
        int loopCount = 1*1000*1000*10;
        long start_time = new Date().getTime();
        for(int i=0;i<loopCount;i++){
            haversine(36.12, -86.67, 33.94, -118.40);
        }
        long end_time = new Date().getTime();
        long elapsed_time=end_time-start_time;
        System.out.println("Java elapsed time = "+elapsed_time+" ms");
    }
}

Versions :

Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

Outputs:

Java elapsed time = 229 ms

Python elapsed time = 15.028019428253174 seconds

Can I improve the Python performance?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
Shamit Verma
  • 3,839
  • 23
  • 22
  • 3
    well, Java is compiled to bytecode while python is interpreted language. – Yoav Sternberg Apr 27 '17 at 17:38
  • 3
    If you want something faster with Python, you should look at using the `numpy` library and vectorising the calculation. See [Fast Haversine Approximation (Python/Pandas)](http://stackoverflow.com/questions/29545704/fast-haversine-approximation-python-pandas). It's kinda futile to hope for pure Python to compete with any compiled language, [this](https://benchmarksgame.alioth.debian.org/u64q/python.html) shows its failings against Java in multiple areas. They are for different things. – roganjosh Apr 27 '17 at 17:40
  • this is more ranty blog post than question, and is asking for a code review which would be more appropriate on http://codereview.stackoverflow.com –  Apr 27 '17 at 17:42
  • Yes but the numpy math functions are generally slower than the math – Luca Di Liello Apr 27 '17 at 17:42
  • 1
    @YoavSternberg: Python is also compiled to bytecode. Bytecode compilation isn't the deciding factor; JIT compilation, unboxed primitives, and static dispatch matter more. – user2357112 Apr 27 '17 at 17:43
  • @LucaDiLiello where is the evidence for that? Are you trying to just use the numpy math functions on python `dtypes`? – roganjosh Apr 27 '17 at 17:44
  • 6
    I'm voting to close this question as off-topic because questions with working code looking to improve the speed or efficiency of the code belong on CodeReview.StackExchange. Stack Overflow is for questions about code that does not work in some way. – TylerH Apr 27 '17 at 17:44
  • @TylerH: I'm on the fence with the closure vote, since OP also asked *why* the Python code is slower. – Prune Apr 27 '17 at 17:49
  • 2
    @Prune I don't see anywhere in the question or comments where OP asks why it's slower. The only question from OP I see is "**Can something be changed to improve Python performance ?**" – TylerH Apr 27 '17 at 17:51
  • 1
    [StakcOverflow](http://stackoverflow.com/questions/3650194/are-numpys-math-functions-faster-than-pythons) [ShockSolutions](http://www.shocksolution.com/2009/01/optimizing-python-code-for-fast-math/) @roganjosh I think math is fast on basic operations. – Luca Di Liello Apr 27 '17 at 17:53
  • @LucaDiLiello ok, so it is contextual (which is why I asked if you were referring to normal python data types). Also, the benchmark in the accepted answer to the stack overflow question seems plain wrong. Why is the `import` also being timed on each loop? – roganjosh Apr 27 '17 at 17:57
  • 2
    @TylerH: I see; my interpretation of the overall thrust ... and my error, I believe. So much for my fence-sitting; I'm the 4th closure vote. – Prune Apr 27 '17 at 17:58
  • @roganjosh you are right, but it seems the other link makes clear tests. – Luca Di Liello Apr 27 '17 at 18:02
  • I'm impressed that the JVM fails so badly at hoisting the computation out of the loop. Everything constant folds, so there's no reason it should run any of it at all. But you can just use PyPy for a fairer comparison, since that's a Python interpreter that actually tries to be fast. – Veedrac Apr 28 '17 at 03:23
  • Hi, All. Question is about "How can this code be improved" ? @MSeifert answer seems quite promising. – Shamit Verma Apr 28 '17 at 04:33
  • @roganjosh , what can be changed to avoid import in timing ? – Shamit Verma Apr 28 '17 at 04:37
  • Java uses the fact that you use the same constants (inlines or something). If I put these constants into an array of doubles and call the haversine function with those, or if I pass them from some other variable source (which is how it happens in real life I'd say), it runs in 2000 ms instead of 300 ms on my machine. – starikoff Apr 28 '17 at 10:46
  • 1
    I think you could improve performance by some factor after using `cython` – Mahadeva Jan 17 '19 at 06:06
  • Take a look here https://pastebin.com/ahYRUvfC . You could try to measure its performance against Java version – Mahadeva Jan 18 '19 at 04:57

2 Answers2

3

There are two major bottlenecks:

  • The loop
  • The function calls (the haversine function and the math functions).

If you use and operate on arrays you can only need to call the functions once because the loop is vectorized over the arrays - net result the code runs 30 times faster on my computer (code is taken from Prunes answer but changed so it works with numpy arrays):

import numpy as np

R = 6372.8 # Earth radius in kilometers

def haversine(lat1, lon1, lat2, lon2):
    dLat = np.radians(lat2 - lat1)
    dLon = np.radians(lon2 - lon1)
    lat1 = np.radians(lat1)
    lat2 = np.radians(lat2)
    sinLat = np.sin(dLat/2)
    sinLon = np.sin(dLon/2)

    a = sinLat * sinLat + np.cos(lat1) * np.cos(lat2) * sinLon * sinLon
    c = 2 * np.arcsin(np.sqrt(a))

    return R * c

haversine(np.ones(10000000) * 36.12,
          np.ones(10000000) * -86.67,
          np.ones(10000000) * 33.94,
          np.ones(10000000) * -118.40)

But that creates a lot of huge temporary arrays, with you can avoid them and still use the math functions:

from math import radians, sin, cos, sqrt, asin
from numba import njit

@njit
def haversine(lat1, lon1, lat2, lon2):

    R = 6372.8 # Earth radius in kilometers

    dLat = radians(lat2 - lat1)
    dLon = radians(lon2 - lon1)
    lat1 = radians(lat1)
    lat2 = radians(lat2)

    a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
    c = 2*asin(sqrt(a))

    return R * c

@njit
def haversine_loop():
    res = np.empty(10*1000*1000, dtype=np.float_)
    for x in range(0, 10*1000*1000): 
        res[x] = haversine(36.12, -86.67, 33.94, -118.40)
    return res

haversine_loop()

That actually gives an 50 times improvement over the numpy code (so 150 times faster in the end). But you need to check if such an approach is feasible in your case (numba is not a lightweight dependency!).

Community
  • 1
  • 1
MSeifert
  • 145,886
  • 38
  • 333
  • 352
1

How fast do you want it to be? One decorator and you can get a 6X speedup.

If you have a large number of these to do then the chances are that numexpr or other numpy approaches would be faster still. So, it depends on what you need it for.

from numba import jit

@jit(nopython=true)
def havershine ....
Phil Cooper
  • 5,747
  • 1
  • 25
  • 41