3

Clojure has a number of libraries for generative testing such as test.check, test.generative or data.generators.

It is possible to use higher order functions to create random data generators that are composable such as:

(defn gen [create-fn content-fn lazy]
  (fn [] (reduce #(create-fn %1 %2) (for [a lazy] (content-fn)))))

(def a (gen str #(rand-nth [\a \b \c]) (range 10)))
(a)

(def b (gen vector #(rand-int 10) (range 2)))
(b)

(def c (gen hash-set b (range (rand-int 10))))
(c)

This is just an example and could be modified with different parameters, filters, partials, etc to create data generating functions which are quite flexible.

Is there something that any of the generative libraries can do that isn't also just as (or more) succinctly achievable by composing some higher order functions?

As a side note to the stackoverflow gods: I don't believe this question is subjective. I'm not asking for an opinion on which library is better. I want to know what specific feature(s) or technique(s) of any/all data generative libraries differentiate them from composing vanilla higher order functions. An example answer should illustrate generating random data using any of the libraries with an explanation as to why this would be more complex to do by composing HOFs in the way I have illustrated above.

soulcheck
  • 36,297
  • 6
  • 91
  • 90
optevo
  • 2,016
  • 20
  • 17

1 Answers1

5

test.check does this way better. Most notably, suppose you generate a random list of 100 elements, and your test fails: something about the way you handled that list is wrong. What now? How do you find the basic bug? It surely doesn't depend on exactly those 100 inputs; you could probably reproduce it with a list of just a few elements, or even an empty list if something is wrong with your base case.

The feature that makes all this actually useful isn't the random generators, it is the "shrinking" of those generators. Once test.check finds an input that breaks your tests, it tries to simplify the input as much as possible while still making your tests break. For a list of integers, the shrinks are simple enough you could maybe do them yourself: remove any element, or decrease any element. Even that may not be true: choosing the order to do shrinks in is probably a harder problem than I realize. And for larger inputs, like a list of maps from vectors to a 3-tuple of [string, int, keyword], you'll find it totally unmanageable, whereas test.check has done all the hard work already.

amalloy
  • 89,153
  • 8
  • 140
  • 205
  • Does the shrinking algorithm actually depend upon the data generator? I could see how it might, but it could also be useful (albeit perhaps not practical/easy) for it to be independent. For example if I have an invariant assertion that fails on regular data (i.e. not test), it might be very handy to be able to shrink the inputs back along the stack trace.. – optevo Oct 08 '14 at 23:16
  • It does depend on the generators. There's a lot of plumbing involved, but basically a generator includes functions for producing output, and shrinking it. You can't really run a shrinker on arbitrary data having a model for what produced it. – amalloy Oct 09 '14 at 00:00
  • This is pretty much the difference between test.check and test.generative. Test.generative is basically @optevo 's implementation, but polished. It's easier to extend and customize because of that. While test.check is awesome, because of it's shrinking capacity, writing your own generators for it is harder. – Didier A. May 12 '16 at 19:24