1

Specific problem

I have a problem that I want to solve using a SAT solver in Python (e.g. pycosat or cryptominisat). A SAT solver accepts problem input as a list a CNF expression comprised of individual Boolean clauses.

For my specific task, the problem can be expressed as the following Boolean pseudo-code:

ATMOST_1 (( A1 and not (A11, A12, A13, ...)) or (A2 and not (A21, A22, ...) or ... )) and
ATMOST_1 (( B1 and not (B11, B12, B13, ...)) or (B2 and not (B21, B22, ...) or ... )) and
ATMOST_1 ( ... )

Here, ATMOST_1 means a constraint telling that only 0 or 1 of the conjugated expressions (A1, A2, ...) must be true.

I can't understand how I can convert this to valid CNF. To better understand the roots of this problem, please read further...

Target problem

The root problem is an automatic Arroword filling algorithm I'm working on. An Arroword is a crossword-type puzzle, where blocked cells contain clues for adjacent words. Here's an example of a 25x25 Arroword grid with various word/clue pairs shown (word cells are green and clue cell options are orange):

enter image description here

The total number of vertical and horizontal word/clue options for this grid is 28248 (I've made a rather long formula to derive this number from the grid size and min/max word length). So, in SAT terms, it means exactly as many variables.

To approach the SAT problem, we also need to add constraints. Naturally, each word/clue variant has its own constraints based on its placement, length and structure. Below is a schematic showing constraints for a given word:

enter image description here

Here constrains are shown in grey-highlighted cells. Basically for each word/clue (variable) the following constrains apply:

  • no other words trackable from the same clue cell
  • no words overlapping with the given word in the same direction ('extending' it)
  • no words crossing the clue cell (in any direction)

In practice, every such constraint is a subset of the pool of variables that satisfy the above criteria. So here we approach the SAT problem in question:

For each cell in the grid, find 0 or 1 variable ('at most 1') that satisfies its constraints.

Which gives the following Boolean formula:

( cell1_var1 & ~{constraints} | cell1_var2 & ~{constraints} | ... ) &
( cell2_var1 & ~{constraints} | cell2_var2 & ~{constraints} | ... ) &
... (until there are no blank cells)

In DNF format, it would look far simpler, like:

cell1_var1 & ~{constraints}
cell1_var2 & ~{constraints}
...

And the ultimate task would be to maximize the number of satisfactory clauses.

My code can currently generate the pool of variables (encoding them as integers if required), as well as constrains for each variable.

In opting for a SAT solution, I was inspired by this post on SO. In fact, it is a very close solution to my own, however, it's just too hardcore for my understanding...

Caveats of SAT

Some things I would like to achieve when dealing with the problem:

  • make a separate clause monitoring the number of unfilled (blank) cells (no cells must be left blank) -- probably incremental solving might help?
  • avoid polynomial expansion of CNF clauses when converting from DNF/mixed type formula (with variable count of orders 10k and constrains ~1k for each, it is likely to produce billions of cluses)
s0mbre
  • 361
  • 2
  • 14
  • Are you familiar with [Tseytin transforms](https://en.wikipedia.org/wiki/Tseytin_transformation)? You'll want to introduce intermediate variables that stand in for your expressions. The result will be something like `ATMOST_1(V0, V1, V2, ..)` where V_n are various variables that summarize the expression. Modern solvers are built around an understanding and expectation that inputs are derived this way, and perform various techniques to extract the meaningful parts of it. As for AtMost constraints, there is literature around this but the "obvious" way works well. It's helpful to internalize... – GManNickG Oct 20 '21 at 08:15
  • ...that implication X->Y can be written (-X v Y). So AtMost(a, b, c) is a->-b ^ a->-c ^ b->-c. Note the contrapositives come "for free" - if b is assigned then -a is implied. Although this scales n^2 with the number of constraints, binary clauses are extremely cheap for solvers to store and operate on; often suboptimal encodings with many binary clauses outperform optimal encodings that introduce non-binary clauses. For the more sophisticated encodings (e.g. if you really have many constraints), just search around for "CNF Encoding AtMost-k". You will need to test which works best. – GManNickG Oct 20 '21 at 08:15
  • If you can provide a more concrete example problem, I can post an answer. – GManNickG Oct 20 '21 at 08:15
  • @GManNickG Thanks for your reply! I've enriched the original post adding the specific problem description. I'm specifically interested in making up the correct CNF and avoiding NP-busting of clauses. – s0mbre Oct 21 '21 at 04:00
  • Gotcha. Frankly, you'll probably have much better luck using Answer Set Programming. Encoding to SAT is basically encoding to ASP, then translating that to CNF. Expressing this in ASP directly is both easier and faster to solve. See: https://potassco.org/clingo/, for one such tool. If you really want to do CNF, then I would actually suggest doing something like Sudoko to get something easier under your belt, since a lot of this is practice. But, I have personally tried solving some Scrabble boards ("what words were played") in SAT - it's fun to learn, but it's really not practically useful. – GManNickG Oct 21 '21 at 06:20
  • I'll have a look... Although ASP is certainly a new area to me. What I would like more is an efficient way of automating CNF construction from 'naive' booleans, applying the Tseytin transformations as needed. – s0mbre Oct 21 '21 at 21:27
  • Updated CNF task in post. – s0mbre Oct 22 '21 at 03:36

0 Answers0