0

Regular expression:

/Hello .*, what's up?/i

String which may contain any number of wildcard characters (%):

"% world, what's up?"    (matches)
"Hello world, %?"        (matches)
"Hello %, what's up?"    (matches)
"Hey world, what's up?"  (no match)
"Hello %, blabla."       (no match)

I have thought of a solution myself, but I'd like to see what you are able to come up with (considering performance is a high priority). A requirement is the ability to use any regular expression; I only used .* in the example, but any valid regular expression should work.

Yeti
  • 2,647
  • 2
  • 33
  • 37
  • If are prejudiced, you may not think out of the box anymore. I want you to come up with a completely different kind of thing, instead of improving my solution. And above all, by solution is far from optimal, if I were satisfied with it, I wouldn't have asked you. – Yeti Jan 10 '14 at 13:22
  • 1
    Currenttly, first and second strings aren't supposed to match your regex. – zessx Jan 10 '14 at 13:31
  • @zessx Of course they aren't, the trick is to replace % in the first string by "Hello". But how would such an algorithm look like? Or another approach would be to edit the Regular expression itself, and then try to match it.. – Yeti Jan 10 '14 at 13:33
  • @Yeti: Your question isn't very clear like what strings should be matched and what should be rejected. – anubhava Jan 10 '14 at 13:46
  • 1
    @Yeti : Nice question . But for the first read it is tough to understand , I think so . I am thinking for a solution...... – Sujith PS Jan 10 '14 at 14:01

3 Answers3

1

A little automata theory might help you here. You say

this is a simplified version of matching a regular expression with a regular expression[1]

Actually, that does not seem to be the case. Instead of matching the text of a regular expression, you want to find regular expressions that can match the same string as a given regular expression.

Luckily, this problem is solvable :-) To see whether such a string exists, you would need to compute the union of the two regular languages and test whether the result is not the empty language. This might be a non-trivial problem and solving it efficiently [enough] may be hard, but standard algorithms for this do already exist. Basically you would need to translate the expression into a NFA, that one into a DFA which you then can union.

[1]: Indeed, the wildcard strings you're using in the question build some kind of regular language, and can be translated to corresponding regular expressions

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • So basically you suggest that I should try to find a string X which is matched by both `Hello .*, what's up?` and `% world, what's up?`. And if I am unable to find such a string, then it is no match. If I do find such a string, then it is a match. Interesting point, I'll think about it... :) – Yeti Jan 10 '14 at 15:16
  • Yes, that's what your matching criteria sounds like - you want to find the wildcard strings from the list that could match what your regular expression does. – Bergi Jan 10 '14 at 15:25
0

Not sure that I fully understand your question, but if you're looking for performance, avoid regular expressions. Instead you can split the string on %. Then, take a look at the first and last matches:

// Anything before % should match at start of the string
targetString.indexOf(splits[0]) === 0;

// Anything after % should match at the end of the string
targetString.indexOf(splits[1]) + splits[1].length === targetString.length;

If you can use % multiple times within the string, then the first and last splits should follow the above rules. Anything else just needs to be in the string, and .indexOf is how you can check that.

Explosion Pills
  • 188,624
  • 52
  • 326
  • 405
  • Sorry, but this question is not trivial! Actually this is a simplified version of matching a regular expression with a regular expression. The title of the question should be clear enough. – Yeti Jan 10 '14 at 13:51
0

I came to realize that this is impossible with a regular language, and therefore the only solution to this problem is to replace the wildcard symbol % with .* and then match two regular expressions with each other. This can however not be done by traditional regular expressions, look at this SO-question and it's answers for details.

Or perhaps you should edit the underlying Regular Expression engine for supporting wildcard based strings. Anyone being able to answer this question by extending the default implementation will be accepted as answer to this question ;-)

Community
  • 1
  • 1
Yeti
  • 2,647
  • 2
  • 33
  • 37