4

I try to optimize a simple python algorithm I made that approximately solve the Traveling Salesman Problem :

import math
import random
import matplotlib.pyplot as plt
import datetime


#Distance between two point
def distance(point1, point2):
    return math.sqrt((point2[0]-point1[0])**2+(point2[1]-point1[1])**2)

#TSP TimeTraveler Algorithm
def TSP_TimeTraveler(Set_Points):
    print("Solving TSP")

    #For calculating execution time
    time_start = datetime.datetime.now()

    #Copy the set points
    points = Set_Points.copy()
    route = []

    #Take 3 points at random
    route.append(points.pop(random.randint(0,len(points)-1)))
    route.insert(0,points.pop(random.randint(0,len(points)-1)))
    route.insert(1,points.pop(random.randint(0,len(points)-1)))

    #Calulating the initial route length
    Length = distance(route[0],route[1]) + distance(route[1],route[-1]) + distance(route[-1],route[0])

    #Time Traveler Algorithm
    while len(points)>0 :
        print("Points left : ", len(points),'              ', end="\r")

        #Take a random point from the Set
        point = points.pop(random.randint(0,len(points)-1))

        ###############################################################################################################
        #### Finding the closest route segment by calculation all lengths posibilities and finding the minimum one ####
        ###############################################################################################################
        Set_Lengths = []
        for i in range(1,len(route)):
            #Set of Lengths when the point is on each route segment except the last one
            L = Length - distance(route[i-1],route[i]) + distance(route[i-1],point) + distance(point, route[i])
            Set_Lengths.append((i,L))

        #Adding the last length when the point is on the last segement
        L = Length - distance(route[-1],route[0]) + distance(route[-1],point) + distance(point, route[0])
        Set_Lengths.append((0,L))
        ###############################################################################################################
        ###############################################################################################################

        #Sorting the set of lengths
        Set_Lengths.sort(key=lambda k: k[1])

        #Inserting the point on the minimum length segment
        route.insert(Set_Lengths[0][0], point)

        #Updating the new route length
        Length = Set_Lengths[0][1]

    #Connecting the start point with the finish point
    route.append(route[0])

    #For calculating execution time
    time_end = datetime.datetime.now()
    delta = (time_end-time_start).total_seconds()

    print("Points left : ", len(points),' Done              ',)
    print("Execution time : ", delta, "secs")

    return route

#######################
#Testing the Algorithm#
#######################

#Size of the set
size = 2520

#Generating a set of random 2D points
points = []
for i in range(size):
    points.append([random.uniform(0, 100),random.uniform(0, 100)])

#Solve TSP
route = TSP_TimeTraveler(points)

#Plot the solution
plt.scatter(*zip(*points),s=5)
plt.plot(*zip(*route))
plt.axis('scaled')
plt.show()

The algorithm operate in 3 simple steps :

1/ First step I take 3 points at random from the points set and connect them as the initial route.

2/ Then each next step, I take a point at random from the set of points left. And try to find the closest segment of the route i have and connect it to it.

3/ I keep repeating step 2/ until the set of points left is empty.

Here is a gif of how the algorithm solve a set of 120 points : TimeTravelerAlgorithm.gif

I give it the name "Time Traveler" because it's operate like a greedy salesman algorithm. But instead traveling to the closest new city in the present, the greedy salesman time travel to the past to the closest city he had already visited and go visit that new city then continue his normal route.

The time traveler start a route of 3 cities, and the traveler add a new city each step in his past, until he reach a present where he visited all the cities and returned to his home city.

The algorithm give decent solutions fast for small set of points. Here is the execution time for each number of sets, all are made on a 2.6GHz dual-core Intel Core i5 processor Macbook :

  • 120 points in around 0.03 secs
  • 360 points in around 0.23 secs
  • 2520 points in around 10 secs
  • 10 000 points in around 3 mins
  • 100 000 points in around 5 hours (Solution Map)

The algorithm is far from being optimized, because in some cases it gives cross routes which is suboptimal. And It's all made in pure python. Maybe using numpy or some advance library or even GPU can speed up the program.

I want your review and help on how to optimize it. I try to approximately solve without cross routes for set of points that can be extremely large (from 1 million to 100 billions points).

PS: My algorithm and codes are open. People from internet, feel free to use it in any project or any research you have.

  • 1
    You could do comparisons for distance squared, avoiding computing sqrt(). – Severin Pappadeux Oct 05 '20 at 00:19
  • 1
    There's a somewhat similar algorithm out there, but I can't recall what it is called. Performance optimization would be to use a method that allows you to determine the nearest point in the data in `O(log h)` instead of `O(h)`, where `h` is the current points in the solution. Likely a KD-tree or something. Also implement 2- or 3-opt to get rid of the crosses. – Nuclearman Oct 05 '20 at 07:47
  • 1
    What's the time complexity of the algorithm, and what of the implementation? If your implementation in slower, then you might have an issue with data structure and your assumptions of time complexities of operations on them (e.g. inserting into a list is `O(n)`, from https://wiki.python.org/moin/TimeComplexity). – domen Oct 05 '20 at 08:13
  • The I believe the complexity is either O(n^2) or O(n^3). I tried to execute the code on PyPy and it run a lot faster. The 100 000 points case that took 5 hours, PyPy solved it in 13 min. About the inserting O(n), will using Linked List will optmize the O(n) into O(1) from pop and insert? – Yoshi Takeshi Oct 05 '20 at 10:24
  • 1
    If you aren't aware of them already, I would recommend running your algorithm against the "standard" TSP benchmark Test Data cases and see how they do. http://www.math.uwaterloo.ca/tsp/data/index.html (here is a PDF with the TSPLIB format definition http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/tsp95.pdf) – Nuclearman Oct 23 '20 at 19:08
  • @Nuclearman Thanks, I tested on a small case XQL662 (http://www.math.uwaterloo.ca/tsp/vlsi/xql662.tour.html), in the site it said it the optimum is 2513 solved in 212.5 sec at 2002 with Concorde with a 1.6Ghz. My algorithm got a 2936 in 3sec (standard python3 in i5 2.3Gghz) which is +16% more than that optimum route. (my result https://ibb.co/Ln8VMwQ) I noticed It always output the same route 2936, even if the initial points and added points are random. So maybe it stuck in a local minimum. – Yoshi Takeshi Oct 30 '20 at 17:25
  • 1
    Hmmm, try http://www.math.uwaterloo.ca/tsp/world/zitour.html instead. The VLSI are tricker to optimize because they can be equidistant in some cases (they are basically circuit boards), so if you don't handle cases where AB has an equal distance to BC, you can run into issues on that point set. Recommend testing it on other benchmark instances as well. See where you go the most wrong. I can see how the algorithm went wrong, but 16% isn't bad for an approximation. Might help if you bump up the 2-opt for a 3-opt. I would implement and see if it helps much. – Nuclearman Oct 31 '20 at 05:28
  • 1
    Also you might want to consider adding a "remove" city logic to handle "bad additions". IE: Add X cities (say X=2), then remove the Xmost expensive cities (longest AB BC distance). I think that might help with the issues I see. For example, you could add 2 cities, then remove the most expensive one (overall) or add 5 cities then remove 3. In any case, always add more than you remove. – Nuclearman Oct 31 '20 at 05:35
  • 1
    Another optimization is to have more than one group of 3 nodes. IE: Split up the entire graph into N/3 groups, then aim to optimize moving 1 or more nodes from a group to another group to reduce the path length. However, this seems like it's the start of a different algorithm rather than an optimization of yours and thus may or may not be better. – Nuclearman Oct 31 '20 at 05:38
  • @Nuclearman Nice I really like the idea of adding and removing the most expensive cites. Adding 2 removing 1. Adding 5 removing 3. I need to test each case and see what is the optimum result of each one. – Yoshi Takeshi Nov 02 '20 at 11:42
  • For the splitting I had an idea to make an approach like the Quick Sort for list. you take 3 nodes at random. then you find nodes closest to each AB, BC, CD spliting the set in 3. and you solve each set with start point and finish point. (or you apply the same split to those sets again). This maybe will be very efficient to solve large set with millions and billions points. still I dont know about how optimum the result will be. – Yoshi Takeshi Nov 02 '20 at 11:44
  • For the 2-opt I still dont know how the implementation is done. I I just check if two segment cross without measuring any distances and then swap. I saw cases online even if there is no crosses in two segment the 2-opt swap is still done. I tried it with checking distances in stead of cross, but it always mess up my double linked route. I will appreciate If you can help me implement it without that problem. – Yoshi Takeshi Nov 02 '20 at 11:51
  • Sounds good on add/remove. Keep in mind, it'll affect the run time. By how much depends on how close the ratio of add/remove is to 1. Add 1 remove 1 obviously will never end. add 100 remove 99 will take I think 100x longer than add 1 remove 0. ||| Could work for splitting. ||| For 3 opt, you can find it online by searching `3 opt traveling salesman problem`, basically just 2 opt you have with 1 more node, but more complicated because of it. The add/remove city might be enough to avoid the need for it. – Nuclearman Nov 03 '20 at 13:26
  • Related [thread](https://stackoverflow.com/a/71196079/8565438). – zabop Feb 20 '22 at 15:55

2 Answers2

1

Thanks for the comments. I re-implemented the algorithm using Objects, Sets and Linked list. I also removed the square root from distance function . Now the code look more clean :

import math
import random
import datetime
import matplotlib.pyplot as plt

#Distance between two point
def distance(point1, point2):
    return (point2[0]-point1[0])**2 + (point2[1]-point1[1])**2

#Distance between two point
class Node:
    def __init__(self, dataval=None):
        self.dataval = dataval
        self.nextval = None

class TSP_TimeTraveler():
    def __init__(self, dataval=None):
        self.count = 0
        self.position = None
        self.length = 0

    def get_position():
        return self.position

    def next_city():
        self.position = self.position.nextval
        return self.position

    #adding a city to the current route with Time Traveler Algorithm :
    def add_city(self, point):
        node = Node(point)
        if self.count <=0 :
            self.position = node
        elif self.count == 1 :
            node.nextval = self.position
            self.position.nextval = node
            self.length = 2*distance(self.position.dataval,node.dataval)
        else : 

            #Creating the traveler
            traveler = self.position

            c = traveler.dataval #current position
            n = traveler.nextval.dataval #next position

            #Calculating the length of adding the city to the path
            Min_L = self.length-distance(c,n)+distance(c,node.dataval)+distance(node.dataval,n)
            Min_Node = traveler

            traveler = traveler.nextval

            while traveler != self.position :
                c = traveler.dataval #current position
                n = traveler.nextval.dataval #next position

                #Calculating the length of adding the city to the path
                L = self.length-distance(c,n)+distance(c,node.dataval)+distance(node.dataval,n)

                #Searching the path to the of city with minimum length
                if L < Min_L :
                    Min_L = L
                    Min_Node = traveler

                traveler = traveler.nextval


            #Adding the city to the minimum path
            node.nextval = Min_Node.nextval
            Min_Node.nextval = node
            self.length = Min_L

        #Incrementing the number of city in the route
        self.count = self.count + 1

    #Get the list of the route
    def getRoute(self):
        result = []

        traveler = self.position
        result.append(traveler.dataval)

        traveler = traveler.nextval

        while traveler != self.position :
            result.append(traveler.dataval)
            traveler = traveler.nextval

        result.append(traveler.dataval)

        return result

    def Solve(self, Set_points):
        print("Solving TSP")

        #For calculating execution time
        time_start = datetime.datetime.now()

        #Copy the set points list
        points = Set_points.copy()

        #Transform the list into set
        points = set(tuple(i) for i in points)

        #Add 
        while len(points)>0 :
            print("Points left : ", len(points),'              ', end="\r")
            point = points.pop()
            self.add_city(point)

        result = self.getRoute()

        #For calculating execution time
        time_end = datetime.datetime.now()
        delta = (time_end-time_start).total_seconds()

        print("Points left : ", len(points),' Done              ',)
        print("Execution time : ", delta, "secs")

        return result

#######################
#Testing the Algorithm#
#######################

#Size of the set
size = 120

#Generating a set of random 2D points
points = []
for i in range(size):
    points.append((random.uniform(0, 100),random.uniform(0, 100)))

#Solve TSP
TSP = TSP_TimeTraveler()

route = TSP.Solve(points)

#Plot the solution
plt.scatter(*zip(*points),s=5)
plt.plot(*zip(*route))
plt.axis('scaled')
plt.show()

And using PyPy instead of normal python it runs alot faster :

  • 120 in around 0.03sec
  • 360 in around 0.05sec
  • 2520 in around 0.22sec
  • 10 000 in around 2sec
  • 100 000 in around 7min

The 100 000 case that took before 5 hours, now it's solved in 7min.

Next, I will try to implement a 2-opt with double linked list and KD-tree. So it can solve for large sets without crosses.

0

I improved the algorithm by adding double linked list and 2-opt at each insertion :

import math
import random
import datetime
import matplotlib.pyplot as plt

#Distance between two point
def distance(point1, point2):
    return (point2[0]-point1[0])**2 + (point2[1]-point1[1])**2

#Intersection between two segments
def intersects(p1, q1, p2, q2):
    def on_segment(p, q, r):
        if r[0] <= max(p[0], q[0]) and r[0] >= min(p[0], q[0]) and r[1] <= max(p[1], q[1]) and r[1] >= min(p[1], q[1]):
            return True
        return False

    def orientation(p, q, r):
        val = ((q[1] - p[1]) * (r[0] - q[0])) - ((q[0] - p[0]) * (r[1] - q[1]))
        if val == 0 : return 0
        return 1 if val > 0 else -1

    o1 = orientation(p1, q1, p2)
    o2 = orientation(p1, q1, q2)
    o3 = orientation(p2, q2, p1)
    o4 = orientation(p2, q2, q1)

    if o1 != o2 and o3 != o4:
        return True

    if o1 == 0 and on_segment(p1, q1, p2) : return True
    if o2 == 0 and on_segment(p1, q1, q2) : return True
    if o3 == 0 and on_segment(p2, q2, p1) : return True
    if o4 == 0 and on_segment(p2, q2, q1) : return True

    return False

#Distance Double Linked Node
class Node:
    def __init__(self, dataval=None):
        self.dataval = dataval
        self.prevval = None
        self.nextval = None

class TSP_TimeTraveler():
    def __init__(self):
        self.count = 0
        self.position = None
        self.length = 0
        self.traveler = None
        self.travelert_past = None
        self.is_2opt = True

    def get_position():
        return self.position

    def traveler_init(self):
        self.traveler = self.position
        self.travelert_past = self.position.prevval
        return self.traveler

    def traveler_next(self):
        if self.traveler.nextval != self.travelert_past:
            self.travelert_past = self.traveler
            self.traveler = self.traveler.nextval
            return self.traveler, False
        else :
            self.travelert_past = self.traveler
            self.traveler = self.traveler.prevval
            return self.traveler, True 

    #adding a city to the current route with Time Traveler Algorithm :
    def add_city(self, point):
        node = Node(point)
        if self.count <=0 :
            self.position = node
        elif self.count == 1 :
            node.nextval = self.position
            node.prevval = node
            self.position.nextval = node
            self.position.prevval = self.position
            self.length = 2*distance(self.position.dataval,node.dataval)
        elif self.count == 2 :
            node.nextval = self.position.nextval
            node.prevval = self.position
            self.position.nextval.prevval = node
            self.position.nextval = node
            self.length = 2*distance(self.position.dataval,node.dataval)
        else : 

            #Creating the traveler
            traveler = self.traveler_init()

            c = traveler #current position
            prev = False #inverse link

            n, prev = self.traveler_next()

            #Calculating the length of adding the city to the path
            Min_prev = prev
            Min_L = self.length-distance(c.dataval,n.dataval)+distance(c.dataval,node.dataval)+distance(node.dataval,n.dataval)
            Min_Node = c

            traveler = n

            while traveler != self.position :
                c = n #current position

                n, prev = self.traveler_next()

                #Calculating the length of adding the city to the path
                L = self.length-distance(c.dataval,n.dataval)+distance(c.dataval,node.dataval)+distance(node.dataval,n.dataval)

                #Searching the path to the of city with minimum length
                if L < Min_L :
                    Min_prev = prev 
                    Min_L = L
                    Min_Node = c
                traveler = n    

            if Min_prev : 
                Min_Next_Node = Min_Node.prevval
            else :
                Min_Next_Node = Min_Node.nextval

            node.nextval = Min_Next_Node
            node.prevval = Min_Node

            if Min_prev :
                Min_Node.prevval = node
            else :
                Min_Node.nextval = node

            if Min_Next_Node.nextval == Min_Node:
                Min_Next_Node.nextval = node
            else :
                Min_Next_Node.prevval = node
            
            self.length = Min_L
            
            #2-OP
            if self.is_2opt == True :
                self._2opt(Min_Node, node, Min_Next_Node)

        #Incrementing the number of city in the route
        self.count = self.count + 1

    #apply the 2opt to a-b-c
    def _2opt(self, a, b, c):
        traveler = self.traveler_init()

        c1 = a
        c2 = b

        n1 = b
        n2 = c

        c = traveler #current position
        t_prev = False
        n, t_prev = self.traveler_next()

        traveler = n

        while traveler != self.position :

            cross = False

            if (c.dataval != c1.dataval and c.dataval != c2.dataval and n.dataval != c1.dataval and n.dataval != c2.dataval) and intersects(c.dataval, n.dataval, c1.dataval, c2.dataval):
                
                self._2optswap(c,n,c1,c2)
                cross = True
                a = n
                n = c1
                c2 = a
                    
            if (c.dataval != n1.dataval and c.dataval != n2.dataval and n.dataval != n1.dataval and n.dataval != n2.dataval) and intersects(c.dataval, n.dataval, n1.dataval, n2.dataval):
                
                self._2optswap(c,n,n1,n2)
                cross = True
                a = n
                n = n1
                n2 = a

            if cross:
                return

            c = n #current position
            n, t_prev = self.traveler_next()
            traveler = n            


    #swap between the crossed segment a-b and c-d
    def _2optswap(self, a, b, c, d):

        if a.nextval == b :
            a.nextval = c
        else :
            a.prevval = c

        if b.prevval == a :
            b.prevval = d
        else :
            b.nextval = d

        if c.nextval == d :
            c.nextval = a
        else :
            c.prevval = a

        if d.prevval == c :
            d.prevval = b
        else :
            d.nextval = b

        self.length = self.length - distance(a.dataval,b.dataval) - distance(c.dataval,d.dataval) + distance(a.dataval,c.dataval) + distance(b.dataval,d.dataval)


    #Get the list of the route
    def getRoute(self):
        result = []

        traveler  = self.traveler_init()
        result.append(traveler.dataval)

        traveler, prev  = self.traveler_next()

        while traveler != self.position :
            result.append(traveler.dataval)
            traveler, prev = self.traveler_next()

        result.append(traveler.dataval)

        return result

    def Solve(self, Set_points, with_2opt = True):
        print("Solving TSP")

        #For calculating execution time
        time_start = datetime.datetime.now()

        #Copy the set points list
        points = Set_points.copy()

        #Transform the list into set
        points = set(tuple(i) for i in points)

        #Add 
        while len(points)>0 :
            print("Points left : ", len(points),'              ', end="\r")
            point = points.pop()
            self.add_city(point)

        result = self.getRoute()

        #For calculating execution time
        time_end = datetime.datetime.now()
        delta = (time_end-time_start).total_seconds()

        L=0
        for i in range(len(result)-1):
            L = L + math.sqrt((result[i-1][0]-result[i][0])**2 + (result[i-1][1]-result[i][1])**2)

        print("Points left : ", len(points),' Done              ',)
        print("Execution time : ", delta, "secs")
        print("Average time per point : ", 1000*delta/len(Set_points), "msecs")
        print("Length : ", L)

        return result

#######################
#Testing the Algorithm#
#######################

#Size of the set
size = 1000

#Generating a set of random 2D points
points = []
for i in range(size):
    points.append((random.uniform(0, 100),random.uniform(0, 100)))

#Solve TSP
TSP = TSP_TimeTraveler()
route = TSP.Solve(points, with_2opt = True)

plt.scatter(*zip(*route), s=5)
plt.plot(*zip(*route))
plt.axis('scaled')
plt.show()

Now the solution give fast results with no cross routes.

With PyPy it solves 100,000 points with no cross routes in 30 min.

Now Im working on implementing the KD-tree to solve for large sets.