How to improve performance of this code?

Question

Thanks to some help from people here, I was able to get my code for Tasmanian camels puzzle working. However, it is horribly slow (I think. I'm not sure because this is my first program in Python). The example run in the bottom of the code takes a long time to be solved in my machine:

dumrat@dumrat:~/programming/python$ time python camels.py
[['F', 'F', 'F', 'G', 'B', 'B', 'B'], ['F', 'F', 'G', 'F', 'B', 'B', 'B'],
 ['F', 'F', 'B', 'F', 'G', 'B', 'B'], ['F', 'F', 'B', 'F', 'B', 'G', 'B'],
 ['F', 'F', 'B', 'G', 'B', 'F', 'B'], ['F', 'G', 'B', 'F', 'B', 'F', 'B'],
 ['G', 'F', 'B', 'F', 'B', 'F', 'B'], ['B', 'F', 'G', 'F', 'B', 'F', 'B'],
 ['B', 'F', 'B', 'F', 'G', 'F', 'B'], ['B', 'F', 'B', 'F', 'B', 'F', 'G'],
 ['B', 'F', 'B', 'F', 'B', 'G', 'F'], ['B', 'F', 'B', 'G', 'B', 'F', 'F'],
 ['B', 'G', 'B', 'F', 'B', 'F', 'F'], ['B', 'B', 'G', 'F', 'B', 'F', 'F'],
 ['B', 'B', 'B', 'F', 'G', 'F', 'F']]

real    0m20.883s
user    0m20.549s
sys    0m0.020s

Here's the code:

import Queue

fCamel = 'F'
bCamel = 'B'
gap = 'G'

def solution(formation):
    return len([i for i in formation[formation.index(fCamel) + 1:]
                if i == bCamel]) == 0

def heuristic(formation):
    fCamels, score = 0, 0
    for i in formation:
        if i == fCamel:
            fCamels += 1;
        elif i == bCamel:
            score += fCamels;
        else:
            pass
    return score

def getneighbors (formation):
    igap = formation.index(gap)
    res = []
    # AB_CD --> A_BCD | ABC_D | B_ACD | ABD_C
    def genn(i,j):
        temp = list(formation)
        temp[i], temp[j] = temp[j], temp[i]
        res.append(temp)

    if(igap > 0):
        genn(igap, igap-1)
    if(igap > 1):
        genn(igap, igap-2)
    if igap < len(formation) - 1:
        genn(igap, igap+1)
    if igap < len(formation) - 2:
        genn(igap, igap+2)

    return res

class node:
    def __init__(self, a, g, p):
        self.arrangement = a
        self.g = g
        self.parent = p

def astar (formation, heuristicf, solutionf, genneighbors):

    openlist = Queue.PriorityQueue()
    openlist.put((heuristicf(formation), node(formation, 0, None)))
    closedlist = []

    while 1:
        try:
            f, current = openlist.get()
        except IndexError:
            current = None

        if current is None:
            print "No solution found"
            return None;

        if solutionf(current.arrangement):
            path = []
            cp = current
            while cp != None:
                path.append(cp.arrangement)
                cp = cp.parent
            path.reverse()
            return path

        #arr = current.arrangement
        closedlist.append(current)
        neighbors = genneighbors(current.arrangement)

        for neighbor in neighbors:
            if neighbor in closedlist:
                pass
            else:
                openlist.put((current.g + heuristicf(neighbor),
                             node(neighbor, current.g + 1, current)))

        #sorted(openlist, cmp = lambda x, y : x.f > y.f)

def solve(formation):
    return astar(formation, heuristic, solution, getneighbors)

print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
#print solve([fCamel, fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel, bCamel])

That is just for 3 camels each. I wanted to do this for 4 at least. That test case is still running (It's been about 5 minutes now :(). I'll update this if and when it finishes.

What should I do to improve this code? (Mostly performance-wise, but any other suggestions are welcome also).

@nakiya: Use http://docs.python.org/library/heapq.html#module-heapq for a priority queue if you don't intend to create a multi-threaded program. (This isn't the bottleneck though.) — kennytm, Nov 28 '10 at 08:00
@KennyTM: I tried it. But I think it must be in some collection. I just went through with the priority queue. NameError: name 'heappush' is not defined — nakiya, Nov 28 '10 at 08:06
@nakiya: `import heapq` ... `openlist = []; heapq.heappush(openlist, stuff)`. — kennytm, Nov 28 '10 at 08:32
The search space for this specific puzzle is very small. In fact, there are only 140 possible ways to order 3 camels each going in either direction. not all of those are even reachable. If your solver takes more than the blink of an eye to reach a conclusion, then you've gone awry somewhere. — SingleNegationElimination, Nov 28 '10 at 08:43
@TokenMacGuy: I think that is true. :) Where have I gone wrong though? I can't figure it out. — nakiya, Nov 28 '10 at 08:56
Hmm. I can't seem to find where in your code you determine which moves are valid moves, IE for the very small game: `[gap, fCamel]`, your code appears to permit `[fCamel, gap]` as a valid neighbor, even though the camel cannot move in that direction. — SingleNegationElimination, Nov 28 '10 at 11:28
"I'll update this if and when it finishes." Did your code ever stop executing? — hochl, Dec 21 '12 at 16:22
One more reason for the slow down of program as the number of camels increases is the use of append. refer this link http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower — Thiru, Jan 22 '13 at 13:06

Mike Dunlavey · Answer 1 · 2011-08-10T13:37:13.523

First let me tell you how to find the problem. Then I'll tell you where it is:

I haven't even bothered to try to figure out your code. I just ran it and took 3 random-time stack samples. I did that by typing control-C and looking at the resulting stacktrace.

One way to look at it is: if a statement appears on X% of random stack traces, then it is on the stack for about X% of the time, so that is what it's responsible for. If you could avoid executing it, that is how much you would save.

OK, I took 3 stack samples. Here they are:

File "camels.py", line 87, in <module>
  print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
File "camels.py", line 85, in solve
  return astar(formation, heuristic, solution, getneighbors)
File "camels.py", line 80, in astar
  openlist.put((current.g + heuristicf(neighbor), node(neighbor, current.g + 1, current)))

File "camels.py", line 87, in <module>
  print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
File "camels.py", line 85, in solve
  return astar(formation, heuristic, solution, getneighbors)
File "camels.py", line 80, in astar
  openlist.put((current.g + heuristicf(neighbor), node(neighbor, current.g + 1, current)))

File "camels.py", line 87, in <module>
  print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
File "camels.py", line 85, in solve
  return astar(formation, heuristic, solution, getneighbors)
File "camels.py", line 80, in astar
  openlist.put((current.g + heuristicf(neighbor), node(neighbor, current.g + 1, current)))

Notice, in this case the stack samples are all identical. In other words, each one of these three lines is individually responsible for nearly all of the time. So look at them:

line        87: print solve([fCamel, fCamel, fCamel, gap, bCamel, bCamel, bCamel])
line solve: 85: return astar(formation, heuristic, solution, getneighbors)
line astar: 80: openlist.put((current.g + heuristicf(neighbor), node(neighbor, current.g + 1, current)))

Clearly line 87 is not one you can avoid executing, and probably not 85 either. That leaves 80, the openlist.put call. Now, you can't tell if the problem is in the + operator, the heuristicf call, the node call, or in the put call. You could find out if you could split those out onto separate lines.

So I hope you pick up from this a quick and easy way to find out where your performance problems are.

Really interesting way of debugging a performance issue. Also highlights that sometimes separating calls onto their own line can be helpful. — Josh Smeaton, Nov 28 '10 at 23:12
@Josh :-) don't get me started ... http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343 — Mike Dunlavey, Nov 29 '10 at 02:09
Unfortunately, the way asynchronous exceptions are handled kind of ruins that nice approach. — YvesgereY, Aug 04 '17 at 18:26
(They have deleted *[Alternatives to gprof](https://stackoverflow.com/questions/1777556)* and all its answers. BTW: the http://www.rotateright.com/ link is broken, but the main page says *"Zoom has not been updated since 2015, so it may not work on newer systems. There are no plans to continue development on Zoom at this time."*) — Peter Mortensen, Aug 18 '18 at 13:00
@PeterMortensen: So it continues, and it doesn't surprise me that *Zoom* is gasping for breath. I was a professor. I know how their thinking and their teaching runs in channels.The "Alternatives to gprof" is not completely gone, though. It's archived [*here*](http://archive.fo/9r927). — Mike Dunlavey, Aug 18 '18 at 15:36
@PeterMortensen: I just saw your first question above. I don't know. If there are times when it can't handle an interrupt, it can't get visibility into those times, but even so, it may be OK if the interrupt is handled immediately after. Like if interrupts are deferred until after an I/O operation, that doesn't give you visibility inside the I/O, but it does tell you which I/O operation in source code the interrupt was inside of. So maybe it's not a problem. — Mike Dunlavey, Aug 18 '18 at 15:49

tkerwin · Accepted Answer · 2010-11-28T09:30:26.370

39

I've been tripped up by this before too. The bottleneck here is actually if neighbor in closedlist.

The in statement is so easy to use, you forget that it's linear search, and when you're doing linear searches on lists, it can add up fast. What you can do is convert closedlist into a set object. This keeps hashes of its items so the in operator is much more efficient than for lists. However, lists aren't hashable items, so you will have to change your configurations into tuples instead of lists.

If the order of closedlist is crucial to the algorithm, you could use a set for the in operator and keep an parallel list around for your results.

I tried a simple implementation of this including aaronasterling's namedtuple trick and it performed in 0.2 sec for your first example and 2.1 sec for your second, but I haven't tried verifying the results for the second longer one.

edited Nov 28 '10 at 09:30

answered Nov 28 '10 at 09:03

tkerwin

9,559
1
31
47

Also, sets aren't ordered and keeping the potential solutions in a sorted container seems to be crucial to the algorithm (which I don't yet understand) so using set's are pretty useless unless the algorithm can be changed to not depend on the sorted property. – aaronasterling Nov 28 '10 at 09:05
1

I don't think keeping the `closedlist` ordered is crucial to the algorithm, but I could be wrong. – tkerwin Nov 28 '10 at 09:14
1

The `in` could be the bottleneck, if something else weren't bigger - *much bigger*. Something on the `openlist.put` line is much bigger. – Mike Dunlavey Nov 29 '10 at 02:07

Justin Peel · Answer 3 · 2010-11-29T01:59:06.220

tkerwin is correct that you should be using a set for closedlist, which speeds things up a lot, but it is still kind of slow for 4 camels on each side. The next problem is that you are allowing a lot of solutions that aren't possible because you are allowing fCamels to go backwards and bCamels to go forward. To fix this, replace the lines,

if(igap > 0):
    genn(igap, igap-1)
if(igap > 1):
    genn(igap, igap-2)
if igap < len(formation) - 1:
    genn(igap, igap+1)
if igap < len(formation) - 2:
    genn(igap, igap+2)

with

if(igap > 0 and formation[igap-1] == fCamel):
    genn(igap, igap-1)
if(igap > 1 and formation[igap-2] == fCamel):
    genn(igap, igap-2)
if (igap < len(formation) - 1) and formation[igap+1] == bCamel:
    genn(igap, igap+1)
if (igap < len(formation) - 2) and formation[igap + 2] == bCamel:
    genn(igap, igap+2)

then I get solution to the 4 camels on each side problem in like .05 seconds rather than 10 seconds. I also tried 5 camels on each side and it took 0.09 seconds. I also am using a set for closedlist and heapq rather than Queue.

Additional speed-up

You can get an additional speed-up by using your heuristic correctly. Currently, you are using the line

openlist.put((current.g + heuristicf(neighbor), node(neighbor, current.g + 1, current)))

(or the heapq version of that) but you should change it to

openlist.put((heuristicf(neighbor), node(neighbor, current.g + 1, current)))

This doesn't factor in the number of moves that has been needed, but that is okay. With this puzzle (and the screening out of moves that move camels in the wrong direction), you don't need to worry about the number of moves it takes - either a move advances you towards the solution or it will come to a dead end. In other words, all possible solutions require the same number of moves. This one change takes the time to find the solution of the 12 camels on each side case from over 13 seconds (even using the heapq, set for closedlist, and the changes to find the neighbors above) to 0.389 seconds. That's not bad.

By the way, a better way to find if you've found the solution is to check if the index of the first fCamel is equal to the length of the formation/2 + 1(using int division) and that the index before that is equal to the gap.

Far more speedup than my solution. Using the right data structures is a standard optimization, but picking out logical problems in the algorithm is the way to go. — tkerwin, Nov 29 '10 at 04:12

score 4 · Answer 4 · edited Aug 18 '18 at 12:21

Replacing

class node:
    def __init__(self, a, g, p):
        self.arrangement = a
        self.g = g
        self.parent = p

with

node = collections.namedtuple('node', 'arrangement, g, parent')

dropped the times from around 340-600 msecs to ~~11.4~~ 1.89¹ msecs on the input [fCamel, fCamel, gap, bCamel, bCamel]. It yielded the same output.

This obviously won't help with any algorithmic problems but as far as micro-optimizations go, it isn't bad.

^{1 I had the wrong input. There was an extra fCamel that was making it run slower.}

score 4 · Answer 5 · edited Aug 18 '18 at 12:22

The code below is using less than 1 second to solve this:

from itertools import permutations

GAP='G'
LEFT='F'
RIGHT='B'
BEGIN=('F','F','F','F','G','B','B','B','B')
END=('B','B','B','B','G','F','F','F','F')
LENGTH=len(BEGIN)

ALL=set(permutations(BEGIN))

def NextMove(seq):
    g=seq.index(GAP)
    ret = set()
    def swap(n):
        return seq[:n]+seq[n:n+2][::-1]+seq[n+2:]
    if g>0 and seq[g-1]==LEFT:
        ret.add(swap(g-1))
    if g<LENGTH-1 and seq[g+1]==RIGHT:
        ret.add(swap(g))
    if g<LENGTH-2 and seq[g+1]==LEFT and seq[g+2]==RIGHT:
        ret.add(seq[:g]+seq[g+2:g+3]+seq[g+1:g+2]+seq[g:g+1]+seq[g+3:])
    if g>1 and seq[g-1]==RIGHT and seq[g-2]==LEFT:
        ret.add(seq[:g-2]+seq[g:g+1]+seq[g-1:g]+seq[g-2:g-1]+seq[g+1:])

    return ret

AllMoves={state:NextMove(state) for state in ALL}

def Solve(now,target):
    if now==target:
        return True
    for state in AllMoves[now]:
        if Solve(state,target):
            print(now)
            return True
    return False

Solve(BEGIN,END)

@SingleNegationElimination note, that dict comprehension is available since Python 2.7. — Jan Vlcinsky, May 05 '16 at 09:14

score 3 · Answer 6 · edited Aug 18 '18 at 12:24

Well, I can't really say quite where your algorithm is running astray, but I just went ahead and made my own. In the interest of doing the simplest thing that could possibly work, I used a bastardized version of Dijkstra's algorithm, where open nodes are visited in arbitrary order, without consideration of distance. This means I don't have to come up with a heuristic.

""" notation: a game state is a string containing angle
    brackets ('>' and '<') and blanks
 '>>> <<<'

 """

def get_moves(game):
    result = []
    lg = len(game)
    for i in range(lg):
        if game[i] == '>':
            if i < lg-1 and game[i+1] == ' ': # '> ' -> ' >'
                result.append(game[:i]+' >'+game[i+2:])
            if i < lg-2 and game[i+1] != ' ' and game[i+2] == ' ': # '>< ' -> ' <>'
                result.append(game[:i]+' '+game[i+1]+'>'+game[i+3:])
        if game[i] == '<':
            if i >= 1 and game[i-1] == ' ': # ' <' -> '< '
                result.append(game[:i-1]+'< '+game[i+1:])
            if i >= 2 and game[i-1] != ' ' and game[i-2] == ' ': # ' ><' -> '<> '
                result.append(game[:i-2]+'<'+game[i-1]+' '+game[i+1:])
    return result



def wander(start, stop):
    fringe = [start]
    paths = {}

    paths[start] = ()

    def visit(state):
      path = paths[state]
      moves = [move for move in get_moves(state) if move not in paths]
      for move in moves:
          paths[move] = paths[state] + (state,)
      fringe.extend(moves)

    while stop not in paths:
      visit(fringe.pop())

    print "still open: ", len(fringe)
    print "area covered: " , len(paths)
    return paths[stop] + (stop,)

if __name__ == "__main__":
    start = '>>>> <<<<'
    stop = '<<<< >>>>'
    print start, "   -->   ", stop
    pathway = wander(start,stop)
    print len(pathway), "moves: ", pathway

score 0 · Answer 7 · answered Nov 29 '10 at 06:39

My other answer is rather long, so I decided to list this as a separate answer. This problem is really better suited to doing a depth-first search. I made a depth-first search solution and it is much, much faster than the optimized A-star method made with the changes outlined in my other answer (which is much, much faster than the OP code). For instance, here are the results for running both my A-star and my depth-first search methods on the 17 camels per side case.

A-star:  14.76 seconds
Depth-first search: 1.30 seconds

Here's my depth-first method code if you are interested:

from sys import argv

fCamel = 'F'
bCamel = 'B'
gap = 'G'

def issolution(formlen):
    def solution(formation):
        if formation[formlen2] == gap:
            return formation.index(fCamel) == x
        return 0
    x = formlen/2 + 1
    formlen2 = formlen/2
    return solution

def solve(formation):
    def depthfirst(form, g):
        if checksolution(form):
            return [tuple(form)], g + 1
        else:
            igap = form.index(gap)
            if(igap > 1 and form[igap-2] == fCamel):
                form[igap-2],form[igap] = form[igap],form[igap-2]
                res = depthfirst(form,g+1)
                form[igap-2],form[igap] = form[igap],form[igap-2]
                if res != 0:
                    return [tuple(form)]+res[0],res[1]
            if (igap < flen - 2) and form[igap + 2] == bCamel:
                form[igap+2],form[igap] = form[igap],form[igap+2]
                res = depthfirst(form,g+1)
                form[igap+2],form[igap] = form[igap],form[igap+2]
                if res != 0:
                    return [tuple(form)]+res[0],res[1]
            if(igap > 0 and form[igap-1] == fCamel):                
                form[igap-1],form[igap] = form[igap],form[igap-1]
                res = depthfirst(form,g+1)
                form[igap-1],form[igap] = form[igap],form[igap-1]
                if res != 0:
                    return [tuple(form)]+res[0],res[1]               
            if (igap < flen - 1) and form[igap+1] == bCamel:
                form[igap+1],form[igap] = form[igap],form[igap+1]
                res = depthfirst(form,g+1)
                form[igap+1],form[igap] = form[igap],form[igap+1]
                if res != 0:
                    return [tuple(form)]+res[0],res[1]                
            return 0
    flen = len(formation)
    checksolution = issolution(flen)
    res = depthfirst(list(formation), 0)
    return res

L = lambda x: tuple([fCamel]*x + [gap] + [bCamel]*x)
print solve(L(int(argv[1])))

How to improve performance of this code?

7 Answers7

Linked