37

I would like to know if there is software that, given a regex and of course some other constraints like length, produces random text that always matches the given regex. Thanks

Paralife
  • 6,116
  • 8
  • 38
  • 64

9 Answers9

29

Yes, software that can generate a random match to a regex:

Sjoerd
  • 74,049
  • 16
  • 131
  • 175
21

Xeger is capable of doing it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
Wilfred Springer
  • 10,869
  • 4
  • 55
  • 69
  • 3
    Depending on the regular expression used, randomness will be skewed. For example the regex '[a-yZ]' will generate 26 times more 'Z's than other letters. See http://code.google.com/p/xeger/wiki/XegerLimitations – Twilite Sep 18 '13 at 14:14
14

All regular expressions can be expressed as context free grammars. And there is a nice algorithm already worked out for producing random sentences, from any CFG, of a given length. So upconvert the regex to a cfg, apply the algorithm, and wham, you're done.

Jay Kominek
  • 8,674
  • 1
  • 34
  • 51
  • Any known implementation of the algo? Is this a long shot? – Paralife Nov 08 '08 at 00:10
  • I successfully implemented it in Perl years ago, and it saw 'production' use, so I probably did it right. The hardest part of the process was understanding the notation used in the paper. Clear that hurdle and you're golden. – Jay Kominek Nov 08 '08 at 02:29
  • 1
    If I figure out where the Perl is, I'll cough it up, but don't count on anything. – Jay Kominek Nov 08 '08 at 02:30
  • Hm, couldn't recursive matches (Perl has them) and conditionals work together in creating something that isn't even context-free anymore? – Joey Jan 28 '10 at 14:41
8

Check out the RandExp Ruby gem. It does what you want, though only in a limited fashion. (It won't work with every possible regexp, only regexps which meet some restrictions.)

Emil Sierżęga
  • 1,785
  • 2
  • 31
  • 38
Pistos
  • 23,070
  • 14
  • 64
  • 77
8

If you want a Javascript solution, try randexp.js.

fent
  • 17,861
  • 15
  • 87
  • 91
2

Too late but it could help newcomer , here is a useful java library that provide many features for using regex to generate String (random generation ,generate String based on it's index, generate all String..) check it out here .

Example :

    Generex generex = new Generex("[0-3]([a-c]|[e-g]{1,2})");

    // generate the second String in lexicographical order that match the given Regex.
    String secondString = generex.getMatchedString(2);
    System.out.println(secondString);// it print '0b'

    // Generate all String that matches the given Regex.
    List<String> matchedStrs = generex.getAllMatchedStrings();

    // Using Generex iterator
    Iterator iterator = generex.iterator();
    while (iterator.hasNext()) {
        System.out.print(iterator.next() + " ");
    }
    // it print 0a 0b 0c 0e 0ee 0e 0e 0f 0fe 0f 0f 0g 0ge 0g 0g 1a 1b 1c 1e
    // 1ee 1e 1e 1f 1fe 1f 1f 1g 1ge 1g 1g 2a 2b 2c 2e 2ee 2e 2e 2f 2fe 2f 2f 2g
    // 2ge 2g 2g 3a 3b 3c 3e 3ee 3e 3e 3f 3fe 3f 3f 3g 3ge 3g 3g 1ee

    // Generate random String
    String randomStr = generex.random();
    System.out.println(randomStr);// a random value from the previous String list
Mifmif
  • 3,132
  • 18
  • 23
1

We did something similar in Python not too long ago for a RegEx game that we wrote. We had the constraint that the regex had to be randomly generated, and the selected words had to be real words. You can download the completed game EXE here, and the Python source code here.

Here is a snippet:

def generate_problem(level):
  keep_trying = True
  while(keep_trying):
    regex = gen_regex(level)
    # print 'regex = ' + regex
    counter = 0
    match = 0
    notmatch = 0
    goodwords = []
    badwords = []
    num_words = 2 + level * 3
    if num_words > 18:
      num_words = 18
    max_word_length = level + 4
    while (counter < 10000) and ((match < num_words) or (notmatch < num_words)):
      counter += 1
      rand_word = words[random.randint(0,max_word)]
      if len(rand_word) > max_word_length:
        continue
      mo = re.search(regex, rand_word)
      if mo:
        match += 1
        if len(goodwords) < num_words:
          goodwords.append(rand_word)
      else:
        notmatch += 1
        if len(badwords) < num_words:
          badwords.append(rand_word)
    if counter < 10000:
      new_prob = problem.problem()
      new_prob.title = 'Level ' + str(level)
      new_prob.explanation = 'This is a level %d puzzle. ' % level
      new_prob.goodwords = goodwords
      new_prob.badwords = badwords
      new_prob.regex = regex
      keep_trying = False
      return new_prob
HanClinto
  • 9,423
  • 3
  • 30
  • 31
0

Instead of starting from a regexp, you should be looking into writing a small context free grammer, this will allow you to easily generate such random text. Unfortunately, I know of no tool which will do it directly for you, so you would need to do a bit of code yourself to actually generate the text. If you have not worked with grammers before, I suggest you read a bit about bnf format and "compiler compilers" before proceeding...

kasperjj
  • 3,632
  • 27
  • 25
0

I'm not aware of any, although it should be possible. The usual approach is to write a grammar instead of a regular expression, and then create functions for each non-terminal that randomly decide which production to expand. If you could post a description of the kinds of strings that you want to generate, and what language you are using, we may be able to get you started.

Andru Luvisi
  • 24,367
  • 6
  • 53
  • 66