I have a question regarding performance of a specific piece of code in Java and Python.
Algorithm:
I am generating random N-dimensional points and then for all points under a certain distance threshold of each other I do some processing. The processing itself is not important here as it does not impact total execution time. Generating the points also takes a fraction of a second in both cases, so I am only interested in the part that does the comparing.
Execution times:
For a fixed input of 3000 points and 2 dimensions, Java does this in 2 to 4 seconds while Python takes anywhere between 15 and 200 seconds.
I am a little skeptical about the Python execution times. Is there anything I'm missing in this Python code? Are there any algorithmic improvement suggestions (e.g. pre-allocating/reusing memory, a way of lowering Big-Oh complexity, etc.)?
Java
double random_points[][] = new double[number_of_points][dimensions];
for(i = 0; i < number_of_points; i++)
for(d = 0; d < dimensions; d++)
random_points[i][d] = Math.random();
double p1[], p2[];
for(i = 0; i < number_of_points; i++)
{
p1 = random_points[i];
for(j = i + 1; j < number_of_points; j++)
{
p2 = random_points[j];
double sum_of_squares = 0;
for(d = 0; d < DIM_; d++)
sum_of_squares += (p2[d] - p1[d]) * (p2[d] - p1[d]);
double distance = Math.sqrt(ss);
if(distance > SOME_THRESHOLD) continue;
//...else do something with p1 and p2
}
}
Python 3.2
random_points = [[random.random() for _d in range(0,dimensions)] for _n in range(0,number_of_points)]
for i, p1 in enumerate(random_points):
for j, p2 in enumerate(random_points[i+1:]):
distance = math.sqrt(sum([(p1[d]-p2[d])**2 for d in range(0,dimensions)]))
if distance > SOME_THRESHOLD: continue
#...else do something with p1 and p2