2

Given a regular expression, is is possible to find a string that matches that expression programmatically? If so, please mention an algorithm for that, assuming that a string exists.

Bonus question: Give the performance/complexity of that algorithm, if able.


PS: Note I am not asking this: Programmatically derive a regular expression from a string. More likely I am asking the reserve problem.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • As far I can understand the question, you are looking for algorithm which generate regular expression with given set of string combination?? Is that you are looking for. – Simmant Sep 14 '17 at 08:41
  • No @Simmant, given a regular expression, I want a string that matches that expression. – gsamaras Sep 14 '17 at 08:59
  • Ok got it, it more like searching algorithms if I am not wrong. – Simmant Sep 14 '17 at 09:32

3 Answers3

3

Generex is a Java library for generating String from a regular expression.

Check it out: https://github.com/mifmif/Generex

Here is the sample Java code demonstrating library usage:

Generex generex = new Generex("[0-3]([a-c]|[e-g]{1,2})");

// Generate random String
String randomStr = generex.random();
System.out.println(randomStr);// a random value from the previous String list

// generate the second String in lexicographical order that match the given Regex.
String secondString = generex.getMatchedString(2);
System.out.println(secondString);// it print '0b'

// Generate all String that matches the given Regex.
List<String> matchedStrs = generex.getAllMatchedStrings();

// Using Generex iterator
Iterator iterator = generex.iterator();
while (iterator.hasNext()) {
    System.out.print(iterator.next() + " ");
}
// it prints:
// 0a 0b 0c 0e 0ee 0ef 0eg 0f 0fe 0ff 0fg 0g 0ge 0gf 0gg
// 1a 1b 1c 1e 1ee 1ef 1eg 1f 1fe 1ff 1fg 1g 1ge 1gf 1gg
// 2a 2b 2c 2e 2ee 2ef 2eg 2f 2fe 2ff 2fg 2g 2ge 2gf 2gg
// 3a 3b 3c 3e 3ee 3ef 3eg 3f 3fe 3ff 3fg 3g 3ge 3gf 3gg

Another one: https://code.google.com/archive/p/xeger/

Here is the sample Java code demonstrating library usage:

String regex = "[ab]{4,6}c"; 
Xeger generator = new Xeger(regex); 
String result = generator.generate(); 
assert result.matches(regex);
Duc Filan
  • 6,769
  • 3
  • 21
  • 26
1

Assume you define regular expressions like this:

R :=
   <literal string>
   (RR)    -- concatenation
   (R*)    -- kleene star
   (R|R)   -- choice

Then you can define a recursive function S(r) which finds a matching string:

S(<literal string>) = <literal string>
S(rs) = S(r) + S(s)
S(r*) = ""
S(r|s) = S(r)

For example: S(a*(b|c)) = S(a*) + S(b|c) = "" + S(b) = "" + "b" = "b".

If you have a more complex notion of regular expression, you can rewrite it in terms of the basic primitives and then apply the above. For example, R+ = RR* and [abc] = (a|b|c).

Note that if you've got a parsed regular expression (so you know its syntax tree), then the above algorithm takes at most time linear in the size of the regular expression (assuming you're careful to perform the string concatenations efficiently).

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118
1

To find given expression in string which fit under that criteria, for that I had tried below algorithm.

i) Create the array for all strings available in given source.

ii) Create a function with parameters for array, expression and initial index count.

iii) Call function recursively and increase the index with every move, until we match string has not found.

iv) Return/break the function if String with desired expression is found.

Below is same java code:

public class ExpressionAlgo {

    public static void main(String[] args) {
        // TODO Auto-generated method stub

        String data = "A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions";
        regCheck(data.split(" "), "sym", 0); 
    }

    public static void regCheck(String[] ar, String expresion, int i) {
               if(ar[i].contains(expresion)){
                   System.out.println(ar[i]);
                   return;
               }

               if(i<ar.length-1){
                   i=i+1;
                   regCheck(ar, expresion, i);
               }
    }
}

As far as I calculated the complexity of this code is N^3 because I had use split, contains method and call regCheck method recursively.

Simmant
  • 1,477
  • 25
  • 39