Java vs. Python specific code-fragment performance improvement

Question

I have a question regarding performance of a specific piece of code in Java and Python.

Algorithm:
I am generating random N-dimensional points and then for all points under a certain distance threshold of each other I do some processing. The processing itself is not important here as it does not impact total execution time. Generating the points also takes a fraction of a second in both cases, so I am only interested in the part that does the comparing.

Execution times:
For a fixed input of 3000 points and 2 dimensions, Java does this in 2 to 4 seconds while Python takes anywhere between 15 and 200 seconds.

I am a little skeptical about the Python execution times. Is there anything I'm missing in this Python code? Are there any algorithmic improvement suggestions (e.g. pre-allocating/reusing memory, a way of lowering Big-Oh complexity, etc.)?

Java

double random_points[][] = new double[number_of_points][dimensions];
for(i = 0; i < number_of_points; i++)
  for(d = 0; d < dimensions; d++)
    random_points[i][d] = Math.random();

double p1[], p2[];
for(i = 0; i < number_of_points; i++)
{
  p1 = random_points[i];
  for(j = i + 1; j < number_of_points; j++)
  {
    p2 = random_points[j];

    double sum_of_squares = 0;
    for(d = 0; d < DIM_; d++)
      sum_of_squares += (p2[d] - p1[d]) * (p2[d] - p1[d]);

    double distance = Math.sqrt(ss);
    if(distance > SOME_THRESHOLD) continue;

    //...else do something with p1 and p2

  }
}

Python 3.2

random_points = [[random.random() for _d in range(0,dimensions)] for _n in range(0,number_of_points)]

for i, p1 in enumerate(random_points):
  for j, p2 in enumerate(random_points[i+1:]):
    distance = math.sqrt(sum([(p1[d]-p2[d])**2 for d in range(0,dimensions)]))
    if distance > SOME_THRESHOLD: continue

    #...else do something with p1 and p2

I am surprised it takes 2-4 second on Java. I would expect much lower once the JVM warms up. Running this with 3000 points and 2 dimensions it take about 40 micro-second on my machine. — Peter Lawrey, Apr 18 '11 at 15:45
You might try [this](http://stackoverflow.com/questions/4295799/how-to-improve-performance-of-this-code/4299378#4299378) with the python version. — Mike Dunlavey, Apr 18 '11 at 16:00
You're wasting quite some memory by using `range` over `xrange` and slicing (which creates shallow copies) instead of using `itertools.islice`. — , Apr 18 '11 at 16:02
Thanks for the input guys. The question is mostly about Python and whether the execution time could be improved to be in Java's range. I am aware of the `range` vs `xrange`, but I forgot to mention I'm running Python 3.2, so that makes it a non-issue and I'll update my post with that in mind. @Mike - definitely a nice general practice tip. — Vuk, Apr 18 '11 at 17:52

score 5 · Accepted Answer · edited Feb 06 '23 at 19:00

You might want to consider using numpy.

I've just tried the following:

import numpy
from scipy.spatial.distance import pdist
D=2
N=3000
p=numpy.random.uniform(size=(N,D))
dist=pdist(p, 'euclidean')

The last line computes the distance matrix (this is equivalent to computing distance in your code for every pair of points). On my computer it takes about 0.07s.

The main disadvantage of this method is that it requires O(n^2) memory for the distance matrix. If that's a problem, the following may be a better alternative:

for i in xrange(1, N):
  v = p[:N-i] - p[i:]
  dist = numpy.sqrt(numpy.sum(numpy.square(v), axis=1))
  for j in numpy.nonzero(dist > 1.4)[0]:
    print j, i+j

For N=3000, this takes ~0.33s on my computer.

That definitely reduced running time. Cheers. – Vuk Apr 18 '11 at 17:53 — Vuk, Apr 18 '11 at 17:53

Peter Lawrey · Answer 2 · 2011-04-18T16:21:09.913

If I take 30K points and 5 dimensions, which is 100x more work.

int number_of_points = 30000;
int dimensions = 5;
double SOME_THRESHOLD = 0.1;

long start = System.nanoTime();
double random_points[][] = new double[number_of_points][dimensions];
for (int i = 0; i < number_of_points; i++)
    for (int d = 0; d < dimensions; d++)
        random_points[i][d] = Math.random();

double p1[], p2[];
Comparator<double[]> compareX = new Comparator<double[]>() {
    @Override
    public int compare(double[] o1, double[] o2) {
        return Double.compare(o1[0], o2[0]);
    }
};
Arrays.sort(random_points, compareX);

double[] key = new double[dimensions];
int count = 0;
for (int i = 0; i < number_of_points; i++) {
    p1 = random_points[i];
    key[0] = p1[0] + SOME_THRESHOLD;
    int index = Arrays.binarySearch(random_points, key, compareX);
    if (index < 0) index = ~index;
    NEXT: for (int j = i + 1; j < index; j++) {
        p2 = random_points[j];

        double sum_of_squares = 0;
        for (int d = 0; d < dimensions; d++) {
            sum_of_squares += (p2[d] - p1[d]) * (p2[d] - p1[d]);
            if (sum_of_squares > SOME_THRESHOLD * SOME_THRESHOLD) 
                continue NEXT;
        }

        //...else do something with p1 and p2
        count++;
    }
}
long time = System.nanoTime() - start;
System.out.println("Took " + time / 1000 / 1000 + " ms. count= " + count);

Prints

Took 1549 ms. count= 20197

score 1 · Answer 3 · answered Apr 18 '11 at 16:35

Does speed really matter? Here are a couple obvious speed-ups:

Don't compute square root. Just square your threshold and compare against the squared threshold.
Sort your points along one dimension (the outer one in your loop). When two points i and j are farther than than your threshold on just this dimension, then further incrementing j will only produce points farther than this threshold, and you can continue the outer loop.

There may be other algorithmic speed ups, even the above is still O(n^d).

Java vs. Python specific code-fragment performance improvement

3 Answers3