Avoid a deepcopy when doing a BFS

Question

I'm currently solving the second exercise in this assignment (this is not homework, I'm actually trying to solve this other problem). My solution uses a BFS to search for the minimal solution to a variant of the "Lights Out" problem, in which pressing a light will flip the state of every light on the same row and the same column.

I think that my implementation is correct, but it's a bit too slow: it's currently taking 12+ seconds to run on my computer (which is unacceptable for my purposes).

from copy import deepcopy
from itertools import chain
from Queue import PriorityQueue

# See: http://www.seas.upenn.edu/~cis391/Homework/Homework2.pdf
class Puzzle(object):
    def __init__(self, matrix):
        self.matrix = matrix
        self.dim = len(matrix)

    def __repr__(self):
        return str(self.matrix)

    def solved(self):
        return sum([sum(row) for row in self.matrix]) == 0

    def move(self, i, j):
        for k in range(self.dim):
            self.matrix[i][k] = (self.matrix[i][k] + 1) % 2
            self.matrix[k][j] = (self.matrix[k][j] + 1) % 2
        self.matrix[i][j] = (self.matrix[i][j] + 1) % 2

        return self

    def copy(self):
        return deepcopy(self)

    def next(self):
        result = []

        for i in range(self.dim):
            for j in range(self.dim):
                result.append(self.copy().move(i, j))

        return result

    def solve(self):
        q = PriorityQueue()
        v = set()

        q.put((0, self))
        while True:
            c = q.get()

            if c[1].solved():
                return c[0]
            else:
                for el in c[1].next():
                    t = el.tuple()

                    if t not in v:
                        v.add(t)
                        q.put((c[0] + 1, el))

    def tuple(self):
         return tuple(chain.from_iterable(self.matrix))

The culprit, according to cProfile, appears to be the deepcopy call. On the other hand, I see no alternatives: I need to add to the queue another Puzzle object containing a fresh copy of self.matrix.

How can I speed up my implementation?

Here's the test case that I'm using:

print Puzzle([
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]).solve()

which should return 1 (we only need to press the light in the lower right corner).

EDIT: Here's another gnarly test case:

print Puzzle([
    [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
]).solve()

Its solution is at most 14: press all lights on the diagonal that were already on. Unfortunately, the impressive speedup by @zch isn't enough to solve this problem, leading me to believe that, due to the high branching factor, a BFS wasn't the right way to solve this problem.

zch · Accepted Answer · 2014-12-07T10:56:27.080

There is a number of optimizations to be done.

First, avoid deepcopy, implement it your own copying (this by itself worked for me 5x faster):

class Puzzle(object):
    def __init__(self, matrix):
        self.matrix = [list(row) for row in matrix]
        self.dim = len(matrix)

    def copy(self):
        return Puzzle(self.matrix)

Second, in BFS you don't need priority queue, use Queue or implement your own queuing. This gives some speedup. And third, check for being solved before putting it into the queue, not after taking things out. This should allow you to go one level deeper in comparable time:

def solve(self):
    v = set()

    q = [(0, self)]
    i = 0
    while True:
        c = q[i]
        i += 1

        for el in c[1].next():
            t = el.tuple()

            if t not in v:
                if el.solved():
                    return c[0] + 1
                v.add(t)
                q.append((c[0] + 1, el))

Further, using a list of list of bits is very memory-inefficient. You can pack all the bits into a single integer and get much faster solution. Additionally you can precompute masks for allowed moves:

def bits(iterable):
    bit = 1
    res = 0
    for elem in iterable:
        if elem:
            res |= bit
        bit <<= 1
    return res

def mask(dim, i, j):
    res = 0
    for idx in range(dim * i, dim * (i + 1)):
        res |= 1 << idx
    for idx in range(j, dim * dim, dim):
        res |= 1 << idx
    return res

def masks(dim):
    return [mask(dim, i, j) for i in range(dim) for j in range(dim)]

class Puzzle(object):
    def __init__(self, matrix):
        if isinstance(matrix, Puzzle):
            self.matrix = matrix.matrix
            self.dim = matrix.dim
            self.masks = matrix.masks
        else:
            self.matrix = bits(sum(matrix, []))
            self.dim = len(matrix)
            self.masks = masks(len(matrix))

    def __repr__(self):
        return str(self.matrix)

    def solved(self):
        return self.matrix == 0

    def next(self):
        for mask in self.masks:
            puzzle = Puzzle(self)
            puzzle.matrix ^= mask
            yield puzzle

    def solve(self):
        v = set()

        q = [(0, self)]
        i = 0
        while True:
            c = q[i]
            i += 1

            for el in c[1].next():
                t = el.matrix

                if t not in v:
                    if el.solved():
                        return c[0] + 1
                    v.add(t)
                    q.append((c[0] + 1, el))

And finally for another factor of 5 you can pass around just bit matrices, instead of whole Puzzle objects and additionally inline some most often used function.

def solve(self):
    v = set()

    q = [(0, self.matrix)]
    i = 0
    while True:
        dist, matrix = q[i]
        i += 1

        for mask in self.masks:
            t = matrix ^ mask

            if t not in v:
                if t == 0:
                    return dist + 1
                v.add(t)
                q.append((dist + 1, t))

For me these optimizations combined give speedup of about 250 times.

Thank you very much for all these tips. I'm going to accept your answer, but I'm also going to add an harder test case that, unfortunately, this optimized implementation can't solve. That's not your fault, though, it's the fact that the algorithm I chose doesn't scale well with n. — Jacopo Notarstefano, Dec 07 '14 at 16:56
@JacopoNotarstefano, I found your question interesting and also [implemented different algorithm](http://ideone.com/nC1O0S). This is basically [http://math.stackexchange.com/a/441697](Aryabhata's solution) with simplification/normalization phase based on idea that moves on four corners of any rectangle cancel each other out. It returns matrix of places where you should make a move. — zch, Dec 08 '14 at 00:15
Thank you again. Here's a failing test case for your algorithm: Puzzle([ [0, 0, 1], [0, 1, 0], [1, 0, 0] ]).solve() I think your idea is pretty neat, though. — Jacopo Notarstefano, Dec 08 '14 at 20:40
@JacopoNotarstefano, my mistake, touching corners of rectangle changes corners of rectangle. There are some other selections that cause no change (for example `row/column xor row/column` on odd-sized board); perhaps some collection of them would be sufficient for normalizing to optimal solution, perhaps not. — zch, Dec 08 '14 at 21:44

score 2 · Answer 2 · answered Dec 06 '14 at 23:58

I changed solve to

    def solve(self):
        q = PriorityQueue()
        v = set()

        q.put((0, self))
        while True:
            c = q.get()

            if c[1].solved():
                return c[0]
            else:
                for i in range(self.dim):
                    for j in range(self.dim):
                        el = c[1].move(i, j) # do the move
                        t = el.tuple()

                        if t not in v:
                           v.add(t)
                           q.put((c[0] + 1, el.copy())) # copy only as needed

                        c[1].move(i, j) # undo the move

As .move(i, j) is its own inverse. Copies are made but only when the state has not been visited. This reduces the time from 7.405s to 5.671s. But this is not as big an improvement as expected.

Then replacing def tuple(self): with:

def tuple(self):
     return tuple(tuple(r) for r in self.matrix)

reduces the time from 5.671s to 0.531s. That should do it.

Full listing:

from copy import deepcopy
from Queue import PriorityQueue

# See: http://www.seas.upenn.edu/~cis391/Homework/Homework2.pdf
class Puzzle(object):
    def __init__(self, matrix):
        self.matrix = matrix
        self.dim = len(matrix)

    def __repr__(self):
        return str(self.matrix)

    def solved(self):
        return sum([sum(row) for row in self.matrix]) == 0

    def move(self, i, j):
        for k in range(self.dim):
            self.matrix[i][k] = (self.matrix[i][k] + 1) % 2
            self.matrix[k][j] = (self.matrix[k][j] + 1) % 2
        self.matrix[i][j] = (self.matrix[i][j] + 1) % 2

        return self

    def copy(self):
        return deepcopy(self)

    def solve(self):
        q = PriorityQueue()
        v = set()

        q.put((0, self))
        while True:
            c = q.get()

            if c[1].solved():
                return c[0]
            else:
                for i in range(self.dim):
                    for j in range(self.dim):
                        el = c[1].move(i, j) # do the move
                        t = el.tuple()

                        if t not in v:
                           v.add(t)
                           q.put((c[0] + 1, el.copy())) # copy only as needed

                        c[1].move(i, j) # undo the move

    def tuple(self):
         return tuple(tuple(r) for r in self.matrix)

print Puzzle([
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]).solve()

Thank you very much for your answer. I'm amazed that the call to `itertools` was that slow, while that simpler "serialization" was that much faster. — Jacopo Notarstefano, Dec 07 '14 at 17:01

Avoid a deepcopy when doing a BFS

2 Answers2

Linked