-1

How can I derive all possible matches of a regular expression

For example:

((a,b,c)o(m,v)p,b)

The strings generated from above expression would be:

aomp

bomp

comp

aovp

bovp

covp

b

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Build a DFA accepting the language of the regular expression, do a BFS and only output whenever you're in an accepting state. Store already output words, break off any branches of the BFS that repeat already output words. – G. Bach Mar 05 '15 at 03:22
  • Whatever solution you choose, don't run it on `a+`. – Sebastian Redl Mar 05 '15 at 15:09
  • Thanks G.Bach and Sebastian.The problem has been resolved. – Nilutpal Borgohain Mar 06 '15 at 13:17
  • The comment: "I have managed to partition the strings seperated by commas and run a recursive algorithm but the products get printed in reverse order." had been edited into the question by the OP. I have rolled that back and placed it here to make the question more clear for future readers. – Jonathan Mee Mar 09 '15 at 11:35

1 Answers1

4

Your steps are pretty straight forward though implementing them may take a bit of work:

  1. Create a recursive function which extracts the string between the first set of parenthesis it comes to: https://stackoverflow.com/a/28863720/2642059
  2. In the function split this strings on ',' into a vector<string> and return it: https://stackoverflow.com/a/28880605/2642059
  3. Before returning test if it is necessary to recurse because of a nested parenthesis, one string must be added to the return for each possible combination returned from recursed functions

EDIT:

Say my input string was "(bl(ah,eck,le),yap)"

  • The first function would extract the string: "bl(ah,eck,le),yap"
  • Before returning it would search for nested parenthesis, this would cause it to recurse:
    • The second function would extract the string: "ah,eck,le"
    • Before returning it would search for nested parenthesis and find none
    • It would return an vector<string>: ["ah","eck","le"]
  • The first function would now contain: "bl["ah","eck","le"],yap"
  • It would not find anymore parenthesis to extract, so it would go to expanding all internal combinations: "["blah","bleck","blle"],yap"
  • It could now split the string and return: ["blah","bleck","blle","yap"]

The return from your first function is your result.

EDIT:

Glad you solved it I wrote up a two state machine to solve it as well so I figured I could post it here for your comparison:

const char* extractParenthesis(const char* start, const char* finish){
    int count = 0;

    return find_if(start, finish, [&](char i){
        if (i == '('){
            count++;
        }
        else if (i == ')'){
            count--;
        }
        return count <= 0; });
}

vector<string> split(const char* start, const char* finish){
    const char delimiters[] = ",(";
    const char* it;
    vector<string> result;

    do{
        for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
            it != finish && *it == '(';
            it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
        auto&& temp = interpolate(start, it);
        result.insert(result.end(), temp.begin(), temp.end());
        start = ++it;
    } while (it <= finish);
    return result;
}

vector<string> interpolate(const char* start, const char* finish){
    vector<string> result{ 1, string{ start, find(start, finish, '(') } };

    for (auto it = start + result[0].size();
        it != finish;
        it = find(++start, finish, '('),
        for_each(result.begin(), result.end(), [&](string& i){ i += string{ start, it }; })){
        start = extractParenthesis(it, finish);

        auto temp = split(next(it), start);
        const auto size = result.size();

        result.resize(size * temp.size());

        for (int i = result.size() - 1; i >= 0; --i){
            result[i] = result[i % size] + temp[i / size];
        }
    }
    return result;
}

Depending upon your compiler you'll need to forward declare these since they call each other. This will also crash fantastically if the input string is malformed. And it can't handle escaped control characters.

Anyway you can call it like this:

const char test[] = "((a,b,c)o(m,v)p,b)";
auto foo = interpolate(begin(test), end(test));

for (auto& i : foo){
    cout << i << endl;
}
Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Could you please give a pseudocode or elaborate step no. 3 – Nilutpal Borgohain Mar 04 '15 at 19:05
  • @NilutpalBorgohain I've expanded the answer a bit, but as mentioned, it will take a bit of work. This is a bad question because it's really asking someone else to do your work for you. If you have further questions I recommend beginning work on this, and when you have something to show ask how you could fix whatever problem you are running into as a new question. – Jonathan Mee Mar 04 '15 at 19:22
  • Thanks Jonathan ,the problem has been resolved. – Nilutpal Borgohain Mar 06 '15 at 13:19
  • 1
    @NilutpalBorgohain Great work man! That was a fun problem. We might get some (more) down votes for posting a solution to a question with no work attached, but I thought it might be helpful to you to look over my work. Let me know if you have questions. Also, If this answer solved your problem please click the check mark next to it. – Jonathan Mee Mar 06 '15 at 13:57