Weird Bug in Python Generator

Question

I have an an implementation of Knuth's Algorithm ("dancing links") that behaves very strangely. I've found a workaround, but it's like magic. The script below tests the code on the N queens problem. The bug occurs in the first function, solve. The parameter limit is supposed to limit the number of solutions generated, and the default value of 0 means "generate all solutions".

#Test generalized exact cover for n queens problem 

def solve(Cols,  Rows, SecondaryIDs=set(), limit = 0):
    for soln in solver(Cols, Rows, SecondaryIDs):
        print('solve:', limit, soln)
        yield soln
        limit -= 1
        if limit == 0: return

def solver(Cols, Rows, SecondaryIDs, solution=[]):
    live=[col for col in Cols if col not in SecondaryIDs] 
    if not live:
        yield solution
    else:
        col = min(live, key = lambda col: len(Cols[col]))        
        for row in list(Cols[col]):                                          
            solution.append(row)                                            
            columns = select(Cols, Rows, row)                        
            for soln in solver(Cols, Rows, SecondaryIDs, solution):
                yield soln
            deselect(Cols, Rows, row, columns)
            solution.pop()

def select(Cols, Rows, row):
    columns = []
    for col in Rows[row]:
        for rrow in Cols[col]:
            for ccol in Rows[rrow]:
                if ccol != col:
                    Cols[ccol].remove(rrow)
        columns.append(Cols.pop(col))  
    return columns

def deselect(Cols, Rows, row, columns): 
    for col in reversed(Rows[row]):
        Cols[col] = columns.pop()
        for rrow in Cols[col]:
            for ccol in Rows[rrow]:
                if ccol != col:
                    Cols[ccol].add(rrow)

n = 5

# From Dancing Links paper
solutionCounts = {4:2, 5:10, 6:4, 7:40, 8:92, 9:352, 10:724}

def makeRows(n):
    # There is one row for each cell.    
    rows = dict()
    for rank in range(n):
        for file in range(n):
            rows["R%dF%d"%(rank,file)] = ["R%d"%rank, "F%d"%file, "S%d"%(rank+file), "D%d"%(rank-file)]
    return rows

def makePrimary(n):
    # One primary column for each rank and file
    prim = dict()
    for rank in range(n):
        prim["R%d"%rank] = {"R%dF%d"%(rank,file) for file in range(n)}
    for file in range(n):
        prim["F%d"%file] = {"R%dF%d"%(rank,file) for rank in range(n)}
    return prim

def makeSecondary(n):
    # One secondary column for each diagonal
    second = dict()
    for s in range(2*n-1):
        second["S%d"%s] = {"R%dF%d"%(r, s-r) for r in range(max(0,s-n+1), min(s+1,n))}
    for d in range(-n+1, n):
        second["D%d"%(-d)]={"R%dF%d"%(r, r+d) for r in range(max(0,-d),min(n-d, n))}
    return second

rows = makeRows(n)
primary = makePrimary(n)
secondary = makeSecondary(n)
primary.update(secondary)
secondary = secondary.keys()
#for soln in solve(primary, rows, secondary, 15):
    #print(soln)
solutions = [s for s in solve(primary, rows, secondary)]
try:
    assert len(solutions) == solutionCounts[n]
except AssertionError:
    print("actual %d expected %d"%(len(solutions), solutionCounts[n]))
for soln in solutions:print(soln)

The code is set up to generate the first 6 solutions to the 5 queens problem, and it works fine. (See the call

solutions = [s for s in solve(primary, rows, secondary, 6)]

at line 80.) There are actually 10 solutions, and if I ask for 10 solutions, I get them all. If I leave off the limit, so that the call is

solutions = [s for s in solve(primary, rows, secondary)]

the main program prints ten empty lists [] as the solutions, but the code in solve prints the real solutions. The same thing happens if I make the limit 15.

The problem seems to arise when I convert the generator to a list in line 80. If I put back the commented-out lines at lines 78 and 79, and comment out everything from line 80 on, the program works as I expect. But I don't understand this; I've often made a list of the objects returned by a generator in this way.

Another thing that's even stranger is that, if I change line 13 to read

yield list(solution)

then the code from line 80 on works fine in all cases. I can't remember how I stumbled on this kludge when I originally wrote the code. I was looking at it today, and changed yield list(solution) to yield solution and that's when the bug became apparent. I can't understand this; solution is already a list. In fact, I've tried adding the line

assert solution == list(solution)

just before line 13 and it never raises AssertionError.

I'm at a complete loss. I've tried to make a smaller script that will reproduce this behavior, but I haven't been able to. Do you understand what's going on, and (harder) can you explain it to me?

Could be another case of [“Least Astonishment” and the Mutable Default Argument](https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument) — Patrick Haugh, Sep 06 '18 at 02:59
See [What is the pythonic way to avoid default parameters that are empty lists?](https://stackoverflow.com/questions/366422/what-is-the-pythonic-way-to-avoid-default-parameters-that-are-empty-lists) — John Kugelman, Sep 06 '18 at 03:00
Nope. The default argument doesn't seem to be the case in this code — iBug, Sep 06 '18 at 03:06
@iBug Yes, I don't see how it could be. I'll change it, just to be sure, but I'm not optimistic. — saulspatz, Sep 06 '18 at 03:07

score 4 · Answer 1 · answered Sep 06 '18 at 03:07

Prediction before seeing the code: yield list(solution) yields a shallow copy of solution. yield solution yields the solution list itself, so when you mutate that list afterwards, you get into trouble.

And it looks like I'm right. :-) Shorter version:

def weird(solution):
    for i in range(len(solution)):
        yield solution
        solution.pop()

which gives:

In [8]: result = list(weird(['a','b','c']))

In [9]: result
Out[9]: [[], [], []]

because

In [10]: [id(x) for x in result]
Out[10]: [140436644005128, 140436644005128, 140436644005128]

but if we yield list(solution) instead, we get

In [15]: list(less_weird(['a','b','c']))
Out[15]: [['a', 'b', 'c'], ['a', 'b'], ['a']]

First we see one of your mutable default arguments, which is a bad idea, but not actually the cause of the bug you see:

def solver(Cols, Rows, SecondaryIDs, solution=[]):
    live=[col for col in Cols if col not in SecondaryIDs] 
    if not live:
        yield solution

Here you yield the solution ^..

else:
    col = min(live, key = lambda col: len(Cols[col]))        
    for row in list(Cols[col]):                                          
        solution.append(row)                                            
        columns = select(Cols, Rows, row)                        
        for soln in solver(Cols, Rows, SecondaryIDs, solution):
            yield soln
        deselect(Cols, Rows, row, columns)
        solution.pop()

And here you mutate the same list you yielded beforehand.

score 4 · Accepted Answer · answered Sep 06 '18 at 03:08

yield solution

The problem is you're yielding a list which you subsequently add and remove items from. By the time the caller examines the list it has changed. You need to return frozen copies of the solution to ensure the results at the point of each yield statement are preserved. Any one of these will do:

yield list(solution)
yield solution[:]
yield tuple(solution)

Weird Bug in Python Generator

2 Answers2