Specific problem
I have a problem that I want to solve using a SAT solver in Python (e.g. pycosat or cryptominisat). A SAT solver accepts problem input as a list a CNF expression comprised of individual Boolean clauses.
For my specific task, the problem can be expressed as the following Boolean pseudo-code:
ATMOST_1 (( A1 and not (A11, A12, A13, ...)) or (A2 and not (A21, A22, ...) or ... )) and
ATMOST_1 (( B1 and not (B11, B12, B13, ...)) or (B2 and not (B21, B22, ...) or ... )) and
ATMOST_1 ( ... )
Here, ATMOST_1
means a constraint telling that only 0 or 1 of the conjugated expressions (A1, A2, ...) must be true.
I can't understand how I can convert this to valid CNF. To better understand the roots of this problem, please read further...
Target problem
The root problem is an automatic Arroword filling algorithm I'm working on. An Arroword is a crossword-type puzzle, where blocked cells contain clues for adjacent words. Here's an example of a 25x25 Arroword grid with various word/clue pairs shown (word cells are green and clue cell options are orange):
The total number of vertical and horizontal word/clue options for this grid is 28248 (I've made a rather long formula to derive this number from the grid size and min/max word length). So, in SAT terms, it means exactly as many variables.
To approach the SAT problem, we also need to add constraints. Naturally, each word/clue variant has its own constraints based on its placement, length and structure. Below is a schematic showing constraints for a given word:
Here constrains are shown in grey-highlighted cells. Basically for each word/clue (variable) the following constrains apply:
- no other words trackable from the same clue cell
- no words overlapping with the given word in the same direction ('extending' it)
- no words crossing the clue cell (in any direction)
In practice, every such constraint is a subset of the pool of variables that satisfy the above criteria. So here we approach the SAT problem in question:
For each cell in the grid, find 0 or 1 variable ('at most 1') that satisfies its constraints.
Which gives the following Boolean formula:
( cell1_var1 & ~{constraints} | cell1_var2 & ~{constraints} | ... ) &
( cell2_var1 & ~{constraints} | cell2_var2 & ~{constraints} | ... ) &
... (until there are no blank cells)
In DNF format, it would look far simpler, like:
cell1_var1 & ~{constraints}
cell1_var2 & ~{constraints}
...
And the ultimate task would be to maximize the number of satisfactory clauses.
My code can currently generate the pool of variables (encoding them as integers if required), as well as constrains for each variable.
In opting for a SAT solution, I was inspired by this post on SO. In fact, it is a very close solution to my own, however, it's just too hardcore for my understanding...
Caveats of SAT
Some things I would like to achieve when dealing with the problem:
- make a separate clause monitoring the number of unfilled (blank) cells (no cells must be left blank) -- probably incremental solving might help?
- avoid polynomial expansion of CNF clauses when converting from DNF/mixed type formula (with variable count of orders 10k and constrains ~1k for each, it is likely to produce billions of cluses)