I'm looking for some kind of "Domino sort" algorithm that sorts a list of two-sided items based on the similarity of "tangent" sides of subsequent items.
Suppose the following list where items are represented by 2-tuples:
>>> items
[(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.32, 0.52),
(0.82, 0.43),
(0.94, 0.64),
(0.39, 0.95),
(0.01, 0.72),
(0.49, 0.41),
(0.27, 0.60)]
The goal is to sort that list such that the sum of squared differences of the tangent ends of each two subsequent items (the loss) is minimal:
>>> loss = sum(
... (items[i][1] - items[i+1][0])**2
... for i in range(len(items)-1)
... )
For the above example this can be computed by just working through all possible permutations but for lists with more items this becomes quickly unfeasible (O(n!)
).
The approach of selecting the best match step-by-step as sketched here
def compute_loss(items):
return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))
def domino_sort(items):
best_attempt = items
best_score = compute_loss(best_attempt)
for i in range(len(items)):
copy = [x for x in items]
attempt = [copy.pop(i)]
for j in range(len(copy)):
copy = sorted(copy, key=lambda x: abs(x[0] - attempt[-1][1]))
attempt.append(copy.pop(0))
score = compute_loss(attempt)
if score < best_score:
best_attempt = attempt
best_score = score
return best_attempt, best_score
gives the following result with a loss of 0.1381
:
[(0.01, 0.72),
(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.49, 0.41),
(0.39, 0.95),
(0.94, 0.64),
(0.82, 0.43),
(0.32, 0.52),
(0.27, 0.6)]
This is however not the best solution which would be
[(0.01, 0.72),
(0.82, 0.43),
(0.27, 0.6),
(0.49, 0.41),
(0.32, 0.52),
(0.39, 0.95),
(0.94, 0.64),
(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65)]
with a loss of 0.0842
. Obviously the above algorithm performs well for the first few items however the differences for the last ones grow so large that they dominate the loss.
Is there any algorithm that can perform this kind of sort in acceptable time dependence (feasible for lists of hundreds of items)?
If it's not possible to do this kind of sort exactly in less than O(n!)
are there any approximate approaches that are likely to return a good score (small loss)?