0

I need some help.

Is there a common way to generate a unique id from a regular expression. I need to create an identifier which matches the following regex:

[A-N|P-Z|1-9]{10}

I have no idea where to start.

Regards LStrike

LStrike
  • 1,598
  • 4
  • 26
  • 58

3 Answers3

1

To generate a string that would match a specific regexp, from the definition of a regexp. I would parse the regexp into its automata (a graph). Then walk the automata, similar to how regexp matchers work but instead of matching, have it write the edges that it traverses.

Take a look at http://hackingoff.com/compilers/regular-expression-to-nfa-dfa, and give it your regexp. It will then draw the graph that I am referring to.

Having a hunt around the internet for you, I found an open source java library that can generate automata from a regexp. So you may be able to use this to get you started: http://www.brics.dk/automaton/

It looks like http://code.google.com/p/xeger will do this for you.

Chris K
  • 11,622
  • 1
  • 36
  • 49
1

If you don't need to dynamically change the regex and you don't need randomness, I would just create a method that dispatches IDs starting from 1111111111 to ZZZZZZZZZZ.

jamp
  • 2,159
  • 1
  • 17
  • 28
1

You have no guarantee of uniqueness-by-construction, because there is a limited number of valid IDs that satisfy that regex; so you should check that is is indeed unique before using. I assume that you want to generate non-sequential IDs (that is, AAAAAAAAAB following AAAAAAAAAA not desired).

Possible code:

string generateID(String valid, int length, Random r) {
    StringBuilder sb = new StringBuilder();
    while (sb.lengh() < length) {
        sb.append(valid.get(r.nextInt(valid.length()));
    }
    return sb.toString();
}

Converting the regex into a string with all valid characters (valid parameter above) requires parsing the regex; but assuming that it is of the form [list-of-chars]{number-of-chars}, as expected above, you can take the list of chars and see which are valid:

String generateFromRegex(string regex, Random r) {
   String charsRegex = regex.replaceAll("[{].*", ""); // strip off repetition count
   StringBuilder valid = new StringBuilder();
   final Charset charset = Charset.forName("US-ASCII"); // assume us-ascii
   for (int i = 0; i < 255; i++) {
     ByteBuffer bb = ByteBuffer.allocate(4);
     bb.putInt(i);
     String charString = new String(bb.array(), charset).trim();
     if (charString.length() == 1 && charString.matches(charsRegex)) {
        valid.append(charString);
     }
   }
   int length = Integer.parseInt(
                  regex.replaceAll(".*[{]", "").replaceAll("}", ""));
   return generateID(valid, length, r);
}

Note that the Random instance is supplied externally, because you want to use the same instance for all calls. If you use a new Random() for each call, it is overwhelmingly likely that you will generate sequences of identical "unique" IDs if you make several successive calls.

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • that is a really nice piece of code. this could be a good starting point for me. thanks a lot. – LStrike Jul 03 '14 at 12:55
  • haven't tested it, though - there may be syntax errors behind every corner – tucuxi Jul 03 '14 at 12:58
  • also, it would be more efficient to use two functions - the first to parse the regex, then you would store the results (arguments for generateID), and then call generateID with those pre-calculated arguments. – tucuxi Jul 03 '14 at 12:59
  • i just tested it and it is working (with little changes). thanks a lot. – LStrike Jul 03 '14 at 13:01