Can Z3 be used to reason about substrings?

Question

I am trying to use Z3 to reason about substrings, and have come across some non-intuitive behavior. Z3 returns 'sat' when asked to determine if 'xy' appears within 'xy', but it returns 'unknown' when asked if 'x' is in 'x', or 'x' is in 'xy'.

I've commented the following code to illustrate this behavior:

(set-logic AUFLIA)
(declare-sort Char 0)

;characters to build strings are _x_ and _y_
(declare-fun _x_ () Char)
(declare-fun _y_ () Char)
(assert (distinct _x_ _y_))

;string literals
(declare-fun findMeX () (Array Int Char))  
(declare-fun findMeXY () (Array Int Char))  
(declare-fun x () (Array Int Char))
(declare-fun xy () (Array Int Char))
(declare-fun length ( (Array Int Char) ) Int )

;set findMeX = 'x'
(assert (= (select findMeX 0) _x_))
(assert (= (length findMeX) 1))

;set findMeXY = 'xy'
(assert (= (select findMeXY 0) _x_))
(assert (= (select findMeXY 1) _y_))
(assert (= (length findMeXY) 2))

;set x = 'x'
(assert (= (select x 0) _x_))
(assert (= (length x) 1))

;set xy = 'xy'
(assert (= (select xy 0) _x_))
(assert (= (select xy 1) _y_))
(assert (= (length xy) 2))

Now that the problem is set up, we try to find the substrings:

;search for findMeX='x' in x='x' 

(push 1)
(assert 
  (exists 
    ((offset Int)) 
    (and 
      (<= offset (- (length x) (length findMeX))) 
      (>= offset 0) 
      (forall 
        ((index Int)) 
        (=> 
          (and 
            (< index (length findMeX)) 
            (>= index 0)) 
          (= 
            (select x (+ index offset)) 
            (select findMeX index)))))))

(check-sat) ;'sat' expected, 'unknown' returned
(pop 1)

If we instead search for findMeXY in xy, the solver returns 'sat', as expected. It would seem that since the solver can handle this query for 'xy', it should be able to handle it for 'x'. Further, if searching for findMeX='x' in 'xy='xy', it returns 'unknown'.

Can anyone suggest an explanation, or perhaps an alternate model for expressing this problem within a SMT solver?

I tried to reproduce the problem you described without success. Which version of Z3 are you using? I tried using Z3 2.19 (Windows and Linux), and the Z3 3.0 (available in the SMT-COMP website) and all of them returned sat for the query above. Thanks, Leo. — Leonardo de Moura, Aug 19 '11 at 22:24
Thanks for the comment, Leo, and you made me realize that I left something out of the question. It seems that the behavior I was experiencing occurs only when the final assertion and check-sat appear between `push` and `pop` commands, which I had been using to try various experiments within the same set of assertions. I've modified the question accordingly, though now it appears that the issue is with scoping. If `push` and `pop` are left out, then you're right, 'sat' is returned in all cases. — Katie, Aug 20 '11 at 20:36
May be you are searching for this: https://www.cs.purdue.edu/homes/zheng16/str/ — user, Jan 16 '14 at 18:17

score 7 · Accepted Answer · answered Aug 22 '11 at 21:54

The short answer for the observed behavior is: Z3 returns ‘unknown’ because your assertions contain quantifiers.

Z3 contains many procedures and heuristics for handling quantifiers. Z3 uses a technique called Model-Based Quantifier Instantiation (MBQI) for building models for satisfying queries like yours. The first step is this procedure consists of creating a candidate interpretation based on an interpretation that satisfies the quantifier free assertions. Unfortunately, in the current Z3, this step does not interact smoothly with the array theory. The basic problem is that the interpretation built by the array theory cannot be modified by this module.

A fair question is: why does it work when we remove the push/pop commands? It works because Z3 uses more aggressive simplification (preprocessing) steps when incremental solving commands (such as push and pop commands) are not used.

I see two possible workarounds for your problem.

You can avoid quantifiers, and keep using array theory. This is feasible in your example, since you know the length of all “strings”. Thus, you can expand the quantifier. Although, this approach may seem awkward, it is used in practice and in many verification and testing tools.
You can avoid array theory. You declare string as an uninterpreted sort, like you did for Char. Then, you declare a function char-of that is supposed to return the i-th character of a string. You can axiomatize this operation. For example, you may say that two strings are equal if they have the same length and contain the same characters:


(assert (forall ((s1 String) (s2 String))
                (=> (and 
                     (= (length s1) (length s2))
                     (forall ((i Int))
                             (=> (and (<= 0 i) (< i (length s1)))
                                 (= (char-of s1 i) (char-of s2 i)))))
                    (= s1 s2))))

The following link contains your script encoded using the second approach: http://rise4fun.com/Z3/yD3

The second approach is more attractive, and will allow you to prove more complicated properties about strings. However, it is very easy to write satisfiable quantified formulas that Z3 will not be able to build a model. The Z3 Guide describes the main capabilities and limitations of the MBQI module. It contains may decidable fragments that can be handled by Z3. BTW, note that dropping array theory will not be a big problem if you have quantifiers. The guide shows how to encode arrays using quantifiers and functions.

You can find more information about how MBQI works in the following papers:

Can Z3 be used to reason about substrings?

1 Answers1