2

In my problem, there are multiple sets of points in the 2D space.
i.e.) 10 points in group 1, 10 points in group 2, 10 points in group 3.

I need to calculate all distances between the points in every group. i.e.)

distance (1,2,1,1): the distance between point 1 in group 1 and point 1 in group 2

distance (1,2,1,2): the distance between point 1 in group 1 and point 2 in group 2

distance (1,2,1,3): the distance between point 1 in group 1 and point 3 in group 2

....

distance (2,3,10,10): the distance between point 10 in group 2 and point 10 in group 3

The distance covers all points in all groups.
Currently, I used 4-four loops as below but it takes too much time.

distt = [[] for i in range(ball_number)]
for m in range(group_number-1):
    for n in range(m+1, group_number):
        for i in range(ball_number):
            for j in range(ball_number):
                    distt[i].insert(j, distance between point[i] and point[j])

One guy advised me something like...."use multiple threads (same numbers of the group) and class, and calculate all distances of a single group with one thread" but I cannot figure out how to do that.

Does anybody can help me for the fast calculation method with multithreading or any advise?

Anmol Singh Jaggi
  • 8,376
  • 4
  • 36
  • 77
  • 1
    Before going multithreaded, I would try numpy, and then numpy and numba . This is the kind of stuff that numba helps a lot. There are lots of discussions on the topic on stack overflow, for instance: https://stackoverflow.com/questions/56126913/efficient-implementation-of-pairwise-distances-computation-between-observations . Also after one thread is fast you can always then look at multithreading if speed gain was not enough. – OddNorg Aug 11 '20 at 21:55
  • 1
    Will multiple threads actually make it faster? In [this question](https://stackoverflow.com/q/62154869) the CPython's Global Interpreter Lock is mentioned and how no performance improvement will be expected by using multiple threads in CPU-bound tasks. Instead, using multiple processes is suggested. – Hernán Alarcón Aug 11 '20 at 23:05

1 Answers1

0

Assuming there are just 3 sets of points, you can create 3 processes.

These 3 processes will compute the distance between every pair of points betwen sets (1,2), (2,3) and (1,3) respectively.

In general, if there are n sets, then we can spawn a process pool with min(num_processor, n*(n-1) / 2) child processes to get maximum parallelism, where num_processor is the number of physical cores in the computer.

Not that we need multiprocessing rather than multithreading as this is a CPU bound operation and not an IO-bound one.

Anmol Singh Jaggi
  • 8,376
  • 4
  • 36
  • 77