Assuming you have a list of points A
through J
, then the matrix pairing all of those up looks like this:
AA AB AC AD AE AF AG AH AI AJ
BA BB BC BD BE BF BG BH BI BJ
CA CB CC CD CE CF CG CH CI CJ
DA DB DC DD DE DF DG DH DI DJ
EA EB EC ED EE EF EG EH EI EJ
FA FB FC FD FE FF FG FH FI FJ
GA GB GC GD GE GF GG GH GI GJ
HA HB HC HD HE HF HG HH HI HJ
IA IB IC ID IE IF IG IH II IJ
JA JB JC JD JE JF JG JH JI JJ
That's what your loop currently calculates. However, the distance AB
and BA
is equal, and the distances on the center line (AA
, BB
, ...) are always zero.
We can cut out workload in half (even less than half, from n^2
to n^2 / 2 - n
) by calculating only the distances between points where x < y
in the matrix.
AB AC AD AE AF AG AH AI AJ
BC BD BE BF BG BH BI BJ
CD CE CF CG CH CI CJ
DE DF DG DH DI DJ
EF EG EH EI EJ
FG FH FI FJ
GH GI GJ
HI HJ
IJ
The blanks can be filled easily by mirroring the upper triangle. Staying within the example, this:
addresses = ['A','B','C','D','E','F','G','H','I','J']
distances = []
for x, a in enumerate(addresses):
row = []
distances.append(row)
for y, b in enumerate(addresses):
if x < y:
row.append(a + b) # actually calculate something
elif x == y:
row.append('--') # that's always 0
else:
row.append(distances[y][x]) # we already calculated that
for row in distances:
print(' '.join(row))
gives us this:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
The next step up in speed could be multi-threading, but maybe this optimization already is good enough for your use case.
A multi-threaded implementation of the above could look like this (it's probably not pythonic in more ways than one, but it does the job):
from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool
# credit https://stackoverflow.com/a/54802737
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
def calculate_travel_distance(a, b):
return a + b
def calculate_distance_matrix(addresses):
# prepare distance matrix, list of lists with n^2 slots
distance_matrix = [['--' for a in addresses] for b in addresses]
# the workload is the upper matrix triangle (where x < y)
# since we're multi-threading, also remember the x,y position
workload = [((a, b),(x, y)) for y, b in enumerate(addresses) for x, a in enumerate(addresses) if x < y]
# worker function
def worker(chunk):
return [(calculate_travel_distance(*points), matrix_pos) for points, matrix_pos in chunk]
# distribute workload over available CPU cores
pool = ThreadPool(cpu_count())
result_chunks = pool.map(worker, chunks(workload, cpu_count()))
# distribute result chunks into their slots
for result_chunk in result_chunks:
for result, matrix_pos in result_chunk:
x, y = matrix_pos
distance_matrix[x][y] = result
distance_matrix[y][x] = result
return distance_matrix
addresses = ['A','B','C','D','E','F','G','H','I','J']
distaince_matrix = calculate_distance_matrix(addresses)
for row in distaince_matrix:
print(' '.join(row))
It prints the same thing:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --