3

I am a newbie to SMT solving and I am writing to inquire some advice and pointers to understand what is a really difficult constraint for SMT solver to solve, for instance Z3.

I tried to tweak the length of bit vectors, for instance in the following way:

>>> a = BitVec("a", 10000)
>>> b = BitVec("b", 10000)
>>> c = a >> 18 + 6 - 32 + 69 == b <<12 + 7 * 32
>>>
>>> s = Solver()
>>> s.add(c)
>>> s.check()

While intuitively this could leads to a quite large search space, it turns out that Z3 still does a pretty good job and solve it swiftly.

I am aware that some crypto hash functions or math formulas (e.g., Collatz conjecture) could make constraint solving quite difficult. But that seems pretty extreme. On the other hand, for instance suppose I have the following constraint:

a * 4 != b + 5

How can I make it more difficult for a constraint solver to solve? Is there any general way of doing so? I got the impression that somehow it a constraint turns to be "non-linear", then it is difficult. But it is still unclear to me how that works.

=================================

Thank you for all the kind notes and insightful posts. I appreciate it very much!

So here are some tentative tests according to @usr's suggestion:

c = BitVec("c", 256)

for i in range(0, 10):
    c = c ^ (c << 13) + 0x51D36335;

s = Solver()
s.add(c == 0xdeadbeef)
print (s.check())
print (s.model())


➜  work time python test.py
sat
[c = 37865234442889991147654282251706833776025899459583617825773189302332620431087]
python test.py  0.38s user 0.07s system 81% cpu 0.550 total
lllllllllllll
  • 8,519
  • 9
  • 45
  • 80
  • If you just want to see Z3 "work really hard" you can try number factorization e.g. `a * b = constant` and input a prime number or a large composite. Or, build a simple hash function and obtain a pre-image. – usr Jan 28 '19 at 18:44
  • @usr Thank you! Could you please shed more lights on the `simple hash function and obtain a pre-image` part? Indeed that's what I was tentatively exploring about, but without too much in-depth understanding in this field, I find it's hard to get the intuition here.. Thank you! – lllllllllllll Jan 28 '19 at 19:49

2 Answers2

2

Bitvector logic is always decidable; so while things may take long, z3 can solve all bitvector problems. Of course, if the bit-vector sizes involved are large, then the solver can take extremely long, or run out of memory before finding a solution. Multiplication and crypto-algorithms are typical examples that always cause a hard time as the bit sizes increase.

On the flip side, we have non-linear integer problems. There's no decision procedure for them, and while z3 "tries its best," such problems are typically beyond scope for theoretical reasons. You can find many instances of these in stack-overflow posts. Here's a most recent one: Z3 returns model not available

alias
  • 28,120
  • 2
  • 23
  • 40
  • Thank you for your answer. Sorry for another naive question, so since non-linear integer problems is so hard, then from a practical perspective, why not modeling everything as "bit vectors"? Our computer is doing everything discrete anyway, right? – lllllllllllll Jan 28 '19 at 20:35
  • Like our programming language, it seems that conventional symbolic execution engine simply model every variables as `bit vectors`... – lllllllllllll Jan 28 '19 at 20:36
  • Most languages these days support arbitrarily large integer values, mapping to SMTLib's unbounded `Int` type. But if you restrict yourself to machine arithmetic only, then yes, BV-solver is your friend. But things like crypto-algorithms will be though. (You can easily ask: Here's the cipher-text and here's the plain-text, find me the key; but it would be really hard for the solver to figure it out because of the state-space explosion. Those sorts of problems essentially reduce to brute-force search, and no solver can really handle them once you have large enough bit-vectors, say > 128-bits.) – alias Jan 28 '19 at 20:47
  • I see the point here! So basically given a symbolic formula, I am thinking to instrument the formula, (hopefully) preserving its semantics, but make it harder for SMT solvers to solve. May I ask if you any intuitions or pointers on that..? Thank you. – lllllllllllll Jan 28 '19 at 21:09
  • You can always conjunct an irrelevant assertion that's hard to establish. That is, pose the question: `my_problem and factor this large bit-vector constant`. That'd do the trick, but I'm guessing you're looking for something more sensible. – alias Jan 28 '19 at 21:13
  • Another idea: Try looking into homomorphic encryption: https://en.wikipedia.org/wiki/Homomorphic_encryption. In theory, you can take your bit-vector computation, transform it into that space, and do the computation. Essentially you'd be performing the same function, except in the encrypted space. That would sufficiently complicate your input problem! (Take this with a grain of salt: The devil is in the details as always.) – alias Jan 28 '19 at 21:18
1

If you just want to see Z3 "work really hard" you can try number factorization e.g. a * b = constant and input a prime number or a large composite.

Or, build a simple hash function and obtain a pre-image:

What I did was to see how SHA-1 is defined. Then, I implemented a simple version of that. SHA-1 consists of very simple operations such as shift, add, xor. From this template you can construct a simple hash function for testing purposes. Then you say y = hash(x) && y = 0x1234 and Z3 will give you x which is the pre-image.

For your entertainment I will make up a simple hash function on the spot:

BitVec currentValue = input;

for (i = 0 to 10)
 currentValue = currentValue ^ (currentValue <<< 13) + 0x51D36335;

BitVec hash = currentValue;

This is an actually functional hash implementation (but not secure). You can play with the operations, number of rounds and bitvector size. You can assert hash = someConstant to obtain a pre-image. For example, let Z3 give you an input that results in a zero hash.

You can apply more fancy constraints as well, for example input == hash or hash % 1234 == 0 && hash & 0xF == 7. Z3 can satisfy any condition at all given enough compute power.

I personally found this capability highly fascinating.

usr
  • 168,620
  • 35
  • 240
  • 369
  • This is cool indeed; such "here's the output, find me the input" sorts of questions can be used for bitcoin mining for instance. But, as you yourself no wonder found out, SMT solvers just cannot deal with these sorts of questions easily; and rightly so. Still, fun to play with! – alias Jan 28 '19 at 21:16
  • @LeventErkok exactly! Even this simple hash algorithm is too much for Z3 to break with higher round numbers. But there are real world hash functions that were broken using SAT solvers. – usr Jan 28 '19 at 21:17
  • 1
    Thank you @usr! I tried the above hash and it seems that Z3 did a fantastic job on it! It can solve a number of tweaks within several seconds on my machine.. – lllllllllllll Jan 28 '19 at 21:20
  • If a SAT solver can find a collision in a hash function, I'd claim that hash function isn't worth its salt (pun intended). People did break old hash functions like MD5 etc., but SHA-3 and the more modern incarnations are safely beyond reach for any tool today, or for the foreseeable future. (Barring some quantum-computing advance that is unlikely but theoretically possible.) – alias Jan 28 '19 at 21:22
  • @lllllllllllll did you use rotation for `<<<`? What bitvector size? I am actually surprised this was so easy. – usr Jan 28 '19 at 21:24
  • @usr What is the corresponding operator in python? I use `<<`. I tried 256 but it is still quite efficient. Let me update my question post. – lllllllllllll Jan 28 '19 at 21:29
  • @usr I updated my question with more info. Am I missed anything here? – lllllllllllll Jan 28 '19 at 21:31
  • `<<` is left shift. This drops a lot of bits and causes less mixing therefore. Use the rotation function that Z3 on Python provides (I use C# so I don't know). Or, `(a << 13 | a >> (bitvectorSize - 13))`. That's equivalent. The additive constant should be the same size as the bitvector size. So probably you should generate a much larger random number or use a more realistic bitvector size such as 32 or 64. – usr Jan 28 '19 at 22:13
  • 1
    @usr Oh, gee, it works! Thank you so much for the clarification! I appreciate it very much. – lllllllllllll Jan 29 '19 at 09:34
  • @lllllllllllll can you please add what you did with execution times ? – whoopdedoo Aug 19 '20 at 22:03