8

I have (let's say) a 100 points. I want to generate a path between them, the shorter, the better. This is the Travelling salesperson problem. Since the TSP is NP-hard, I am satisfied with not finding a global solution. I method which gives a solution quickly & scales well.

Generate example points:

import numpy as np
points = np.random.RandomState(42).rand(100,2)

Generate distance matrix, where the i,j entry contains distance between point[i] and point[j]. Using this answer:

z = np.array([[complex(*c) for c in points]]) # notice the [[ ... ]]
distance_matrix = abs(z.T-z)

How do I continue, to find a shortest path, with at least locally minimal pathlength?


Some related questions:

This thread: Travelling Salesman in scipy provides code for finding a solution to the TSP. It works in an iterative way though, so if the number of points to visit is large, the script does not produce a solution in reasonable times.

This thread: How to solve the Cumulative Traveling Salesman Problem using or-tools in python? does not have a code answer, and is not focused on classical TSP.

This thread: Optimizing a Traveling Salesman Algorithm (Time Traveler Algorithm) provides iterative solutions to the problem (which means bad scaling)

zabop
  • 6,750
  • 3
  • 39
  • 84

4 Answers4

4

This solution uses Google's OR-Tools. Install ortools via python -m pip install --upgrade --user ortools.

from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp

Generate points:

import numpy as np
points = np.random.RandomState(42).rand(100,2)

Calcualte distance matrix as in question:

z = np.array([[complex(*c) for c in points]]) # notice the [[ ... ]]
distance_matrix_pre = abs(z.T-z)

Since the docs say:

Note: Since the routing solver does all computations with integers, the distance callback must return an integer distance for any two locations. If any of the entries of data['distance_matrix'] are not integers, you need to round either the matrix entries, or the return values of the callback, to integers. See Scaling the distance matrix for an example that shows how to avoid problems caused by rounding error.

I am going to multiply by a 10000 every entry in our distance matrix and round down to an int. If want to use more decimal places, multiply by a larger number.

distance_matrix = np.floor(distance_matrix_pre*10000).astype(int).tolist()

Keep following Google's tutorial:

# Create the routing index manager.
manager = pywrapcp.RoutingIndexManager(len(points),1,0)

# Create Routing Model.
routing = pywrapcp.RoutingModel(manager)

def distance_callback(from_index, to_index):
    """Returns the distance between the two nodes."""
    # Convert from routing variable Index to distance matrix NodeIndex.
    from_node = manager.IndexToNode(from_index)
    to_node = manager.IndexToNode(to_index)
    return distance_matrix[from_node][to_node]

transit_callback_index = routing.RegisterTransitCallback(distance_callback)

# Define cost of each arc.
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

The output depends on initial path-selection strategy used (similar to how a numerical solution can depend on inital guess). Here we specify initial strategy:

# Setting first solution heuristic.
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
    routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)

In Google's words:

The code sets the first solution strategy to PATH_CHEAPEST_ARC, which creates an initial route for the solver by repeatedly adding edges with the least weight that don't lead to a previously visited node (other than the depot). For other options, see First solution strategy.

Solve the problem with initial strategy defined above (ie with PATH_CHEAPEST_ARC):

# Solve the problem.
solution = routing.SolveWithParameters(search_parameters)

Get a route:

if solution:
    index = routing.Start(0)
    route = [manager.IndexToNode(index)]
    while not routing.IsEnd(index):
        index = solution.Value(routing.NextVar(index))
        route.append(manager.IndexToNode(index))

Plot result:

plt.scatter(points[:,0],points[:,1])
for a, b in zip(route[:-1],route[1:]):
    x = points[[a,b]].T[0]
    y = points[[a,b]].T[1]
    plt.plot(x, y,c='r',zorder=-1)
plt.gca().set_aspect('equal')

enter image description here


The list of other initial strategies which can be used (source):

initial_strategy_list = \
['AUTOMATIC','PATH_CHEAPEST_ARC','PATH_MOST_CONSTRAINED_ARC','EVALUATOR_STRATEGY','SAVINGS','SWEEP','CHRISTOFIDES','ALL_UNPERFORMED','BEST_INSERTION','PARALLEL_CHEAPEST_INSERTION','LOCAL_CHEAPEST_INSERTION','GLOBAL_CHEAPEST_ARC','LOCAL_CHEAPEST_ARC','FIRST_UNBOUND_MIN_VALUE']

If want to increase confidence in the solution, could iterate over all of them and find the best one.

zabop
  • 6,750
  • 3
  • 39
  • 84
4

This is a examples based on simulated annealing (pip install frigidum). It is most likely slower than other solutions. I expect the posted Google OR to be better, but since it is a different approach might be of interest (albeit for learning/education).

import numpy as np
import frigidum

from frigidum.examples import tsp

points = np.random.RandomState(42).rand(100,2)

tsp.nodes = points
tsp.nodes_count = points.shape[0]

z = np.array([[complex(*c) for c in points]]) # notice the [[ ... ]]
distance_matrix = abs(z.T-z)

tsp.dist_eu = distance_matrix

We can use the following Simulated Annealing scheme (<30 seconds); (to get a better solution, get alpha closer to .999, and/or increate repeats with cost of longer calculation)

local_opt = frigidum.sa(random_start=tsp.random_start,
           objective_function=tsp.objective_function,
           neighbours=[tsp.euclidian_bomb_and_fix, tsp.euclidian_nuke_and_fix, tsp.route_bomb_and_fix, tsp.route_nuke_and_fix, tsp.random_disconnect_vertices_and_fix],
           copy_state=frigidum.annealing.naked,
           T_start=5,
           alpha=.8,
           T_stop=0.001,
           repeats=10**2,
           post_annealing = tsp.local_search_2opt)

local_opt is a tuple with its first element being the solution, and second element the objective value (route length). The the end it will output statistics and the minimum found; (this is not the one plotted).

(Local) Minimum Objective Value Found: 
   7.46016765

I used zabop plot code to plot the solution;

import matplotlib.pyplot as plt

plt.scatter(points[:,0],points[:,1])

route = local_opt[0]
for a, b in zip(route[:-1],route[1:]):
    x = points[[a,b]].T[0]
    y = points[[a,b]].T[1]
    plt.plot(x, y,c='r',zorder=-1)
plt.gca().set_aspect('equal')

https://i.stack.imgur.com/nXqKO.png https://i.stack.imgur.com/bNfdA.jpg

Willem Hendriks
  • 1,267
  • 2
  • 9
  • 15
3

tl;dr : YMMV, but I had the best experience with tsp-solver2.

A number of Python TSP solvers can be installed through pip, just search for either "TSP" or "salesman" on Pypi.org.

Unless you’re looking for a particular algorithm, choosing the right one without a benchmark isn’t quite easy (although many projects lack documentation, which helps filtering).

I tried a few of them, here is my experience.

Note that for my usecase it is usually straightforward to check for the optimal solution, since I need to reorder geocoordinates along a path, generally quite simple and limited to a few hundreds of points.

Timings don’t make sense by themselves and are only provided for mutual comparison, for a 100 points sample.

ortools

Quite verbose to use and didn’t work out of the box, so I didn’t insist.

tsp-c

Raised libstdc++ conflicts I could not solve since I am not an admin on my Dataiku instance.

python-tsp

As of today, provides two exact solvers and two metaheuristic methods :

  • brute force : didn’t try ;
  • dynamic programming : supposed to be faster but used too much memory for me, and was quite slow anyway ;
  • simulated annealing : quite fast (> 10 s), but didn’t find the optimal path ;
  • local search : much faster (~1 s), but also failed to find the optimal path.

Usage :

#from python_tsp.exact import solve_tsp_brute_force as solve_tsp
#from python_tsp.exact import solve_tsp_dynamic_programming as solve_tsp
#from python_tsp.heuristics import solve_tsp_simulated_annealing as solve_tsp
from python_tsp.heuristics import solve_tsp_local_search as solve_tsp
permutation, distance = solve_tsp(distances)

fast-tsp

Uses a local solver. Blazing fast (~100 ms), but very bad results. Needs the distances as integers.

Usage :

import fast_tsp
tour = fast_tsp.find_tour(distances)

tsp-solver2

As often, trust the most humble ones. The project is titled "Suboptimal TSP Solver" and its README is full of warnings about not being tuned for performance and providing suboptimal solutions, yet in my use case it was even faster than fast-tsp (~20 ms) and the one to find the optimal solutions (didn’t fail a single of my paths).

It uses a greedy algorithm followed by optimization (see homepage for details).

Usage :

from tsp_solver.greedy import solve_tsp
path = solve_tsp(distances)

Note that these solvers require a matrix of distances between points. For geocoordinates points, one can create it easily with

from sklearn.metrics.pairwise import haversine_distances
distances = haversine_distances(df[["latitude","longitude"]].values)
Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
  • I am not able to reproduce your conclusion, namely, that `tsp-solver2` should perform better than all the other libraries mentioned including `fast-tsp`. For the example the original asker gave, I get a path cost of ~7.54 with `tsp-solver2` versus a path cost of ~7.39 with `fast-tsp`. I wrote another answer comparing the performance of these two libraries. – shmulvad Aug 16 '23 at 05:47
  • @shmulvad Well as I mentioned, my experience is based on actual geopaths to reorder. I guess that makes a large difference with regard to random points. Unfortunately, many paths went plain wrong. – Skippy le Grand Gourou Aug 22 '23 at 09:02
1

One solution that works well, even for large instances, is fast-tsp. It uses a local solver behind the scenes, the main logic code is written in C++ (making it fast), and it allows you to set the maximum time you're willing to search for better solutions for.

You can install it through pip:

$ pip install fast-tsp

Then you can use it like this:

import fast_tsp
tour = fast_tsp.find_tour(distances)

To be as fast as possible, it uses integers for the distances. Another answer mentions that it is subpar compared to tsp-solver2. This is not my experience from testing. Let me give an example based on your setup:

import numpy as np
from tsp_solver.greedy import solve_tsp
from tsp_solver.util import path_cost
import fast_tsp

points = np.random.RandomState(42).rand(100, 2)
z = np.array([[complex(*c) for c in points]])
distance_matrix = np.abs(z.T - z)

# tsp-solver2
tour_tsp_solver = solve_tsp(distance_matrix)
path_cost(distance_matrix, tour_tsp_solver)  # 7.543497268

# fast-tsp (we first need to scale the distance matrix to integers < UINT16_MAX)
max_mult = (fast_tsp._UINT16_MAX / np.max(distance_matrix)) - 2
scaled_distance_matrix = np.array(np.rint(distance_matrix * max_mult), dtype=np.int32)
tour = fast_tsp.find_tour(scaled_distance_matrix, duration_seconds=0.01)
path_cost(distance_matrix, tour)  # 7.388272614749307 (may vary, but I consistently got < 7.54)

If I run the same as above, but for a much larger number of points, the difference is even bigger. Here are the same results but letting points = np.random.RandomState(42).rand(5000, 2):

Call Path cost Runtime
solve_tsp(distance_matrix) 54.0997 1m53s
fast_tsp.find_tour(scaled_distance_matrix) 52.7590 6s
fast_tsp.find_tour(scaled_distance_matrix, duration_seconds=10) 52.2410 14s
fast_tsp.find_tour(scaled_distance_matrix, duration_seconds=30) 52.0359 34s

Note that the fast_tsp is stochastic, so if you try to recreate the results, they might differ slightly.

Disclaimer: I'm the author of fast-tsp.

shmulvad
  • 636
  • 4
  • 15