13

In one of the StackOverflow Podcasts (the one where guys were discussing data generation for testing DBs -- either #11 or #12), Jeff mentioned something like "reverse regular expressions", which are used exactly for that purpose: given a regex, produce a string which will eventually match said regex.

What is the correct term for this whole concept? Is this a well-known concept?

Community
  • 1
  • 1
Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288

3 Answers3

6

The Perl module String::Random (in the CPAN) does this. Takes a subset of regular expressions, and does a random walk through it.

Randal Schwartz
  • 39,428
  • 4
  • 43
  • 70
  • Will String::Random necessarily generate all possible values in the solution set? I have a similar problem but I need all possible values. I'm willing to heavily restrict my language. – IceArdor Feb 28 '14 at 06:20
5

Abstract: Recursive transition network (with the postmodernism generator as an interesting example)

One specialization would be your "reverse regex".


As to the terminology: A regular expression is a form of grammar that describes all the words belonging to a specific regular language (namely all the inputs matched by the expression).

Therefore one could call your question: "How can create a random word that matches a given regex" or "How can I obtain a random word belonging to a specified regular language".

Dario
  • 48,658
  • 8
  • 97
  • 130
  • A context free grammar (CFG) is not regular, as can be proved using the pumping lemma. However, the idea of an RTN can still be applied, using a nondeterministic finite-state automaton (NFA) instead. – Thomas May 25 '10 at 13:59
  • 1
    @Thomas: Yes, but a regular one is context free, so the same concept can be applied (being nothing but a specialization). – Dario May 25 '10 at 14:09
0

It absolutely possible to generate data from regular expressions. Some open source projects are under development in this area.

A tutorial about how to generate random password from regex will explain you how it is done. xeger (reverse of regex, an opensource project) is used in the tutorial. Please got through the tutorial to learn more.

brainless
  • 5,698
  • 16
  • 59
  • 82