28

Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.

Just a an example: that library would be capable of taking '[ab]*c' as input, and generate samples such as:

abc
abbbc
bac

etc.

Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.

Wilfred Springer
  • 10,869
  • 4
  • 55
  • 69
  • 2
    Cool idea - interested to hear the results. – Ryall Oct 16 '09 at 15:27
  • 1
    This would indeed be quite useful! – p3t0r Oct 16 '09 at 15:28
  • 1
    I think any "...or more" selectors would have to be limited though or you could end up with 1,000,000 character words :S – Ryall Oct 16 '09 at 15:35
  • I don' think such a library exists. You could look into the perl String::Random module which implements something similar for a restricted subset of patterns – jitter Oct 16 '09 at 15:37
  • 1
    You know the saying about the monkeys that could write Shakespeare (Infinite Monkey Theorem) ... well quick and dirty solution: generate random string until you have one that match. That could take a while :-). I would like to see a real reply though. – vdr Oct 16 '09 at 15:39
  • 1
    This sounds like it might be an interesting little project. – Herms Oct 16 '09 at 15:46
  • I just created Xeger, a library that allows you to generate text from regular expressions. It's hosted here: http://code.google.com/p/xeger/ – Wilfred Springer Oct 17 '09 at 17:31
  • Same question here: [http://stackoverflow.com/questions/274011/random-text-generator-based-on-regex](http://stackoverflow.com/questions/274011/random-text-generator-based-on-regex) I haven't tried it. Good question! – sinuhepop Oct 16 '09 at 15:50
  • Trying to see if I can use Ruby Randexp running using JRuby, and get some support for it in Java that way. – Wilfred Springer Oct 17 '09 at 10:06
  • Keep in mind that Java 7 will be able to execute Ruby natively. – sinuhepop Oct 17 '09 at 11:15
  • Your lib is really, really *cool* ! Thanks ! – Benj Oct 05 '15 at 15:36

5 Answers5

17

I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)

This is the way you use it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
Wilfred Springer
  • 10,869
  • 4
  • 55
  • 69
7

I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:

  1. Write a parser for regular expressions (you may want to start out with a restricted class of regexes).

  2. Use the result to construct an NFA.

  3. (Optional) Convert the NFA to a DFA.

  4. Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.

The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.

Stephan202
  • 59,965
  • 13
  • 127
  • 133
2

Here's a few implementations of such a beast, but none of them in Java (and all but the closed-source Microsoft one very limited in their regexp feature support).

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
2

based on Wilfred Springer's solution together with http://www.brics.dk/~amoeller/automaton/ i build another generator. It do not use recursion. It take as input the patter/regularExpression minimum String length and maximum String length. The result is an accepted String between min and max length. It also allow some of the XML "short hand character classes". I use this for an XML Sample Generator that build valid String for facets.

public static final String generate(final String pattern, final int minLength, final int maxLength) {
    final String regex = pattern
            .replace("\\d", "[0-9]")        // Used d=Digit
            .replace("\\w", "[A-Za-z0-9_]") // Used d=Word
            .replace("\\s", "[ \t\r\n]");   // Used s="White"Space
    final Automaton automaton = new RegExp(regex).toAutomaton();
    final Random random = new Random(System.nanoTime());
    final List<String> validLength = new LinkedList<>();
    int len = 0;
    final StringBuilder builder = new StringBuilder();
    State state = automaton.getInitialState();
    Transition[] transitions;
    while(len <= maxLength && (transitions = state.getSortedTransitionArray(true)).length != 0) {
        final int option = random.nextInt(transitions.length);
        if (state.isAccept() && len >= minLength && len <= maxLength) validLength.add(builder.toString());
        final Transition t = transitions[option]; // random transition
        builder.append((char) (t.getMin()+random.nextInt(t.getMax()-t.getMin()+1))); len ++;
        state = t.getDest();
    }
    if(validLength.size() == 0) throw new IllegalArgumentException(automaton.toString()+" , "+minLength+" , "+maxLength);
    return validLength.get(random.nextInt(validLength.size()));
}
SkateScout
  • 815
  • 14
  • 24
0

Here is a Python implementation of a module like that: http://www.mail-archive.com/python-list@python.org/msg125198.html It should be portable to Java.

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122