4

Let's say that two regular expressions e1 and e2 collide if there exists any string s, such that both e1 and e2 match s.

Is there any easy (efficient) way to check if two regular expressions collide without iterating over the set of all possible strings in our dictionary?

Note 1: I don't know if this is called in some other manner in the literature. Maybe I'm just lacking of the proper name to search this.

Note 2: The ideal answer for me is written PHP code, but I accept any suggestion, not necessarily PHP.

NublicPablo
  • 959
  • 11
  • 21
  • possible duplicate of [Regular Expressions: Is there an AND operator?](http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator) – Jay Blanchard Nov 17 '14 at 17:38
  • Generally, its a fair bet that since you can't test every combination of characters that make up a string, you can't compare two regex to see if they would match the same string. I say generally because the regex could be almost identical, or broad in scope that visually it could be spotted. –  Nov 17 '14 at 17:39
  • 2
    I'm sorry, but I don't see any duplication. Using an AND operator would require to iterate through the whole dictionary of strings, and the question explicitly ask not to do that. – NublicPablo Nov 17 '14 at 17:45
  • What if you implemented something to where you matched against only one of the regex, and if there is a match, check against your other regex. You can loop through your dictionary checking only against one of them until you have a hit, as to not check everything against both patterns. – kayleeFrye_onDeck Nov 17 '14 at 22:23
  • I'm not sure you can do such thing. Suppose `e1 = /[a-z]/` and `e2 = /[0-9]/` they both match `s = 'a1'` but you can't say `e1 matches same thing that e2` – Toto Nov 18 '14 at 08:26

1 Answers1

1

So, after further research, it looks like this is called regular expression intersection in the literature.

This is possible and apparently it is not difficult to implement, but it seems that there is no official PHP support.

The key to the implementation of an easy algorithm relies on translating the regular expressions into a finite automaton. Read attached links to a better understanding of the solution.

Stackoverflow related questions:

Intersection of two regular expressions

Calculate if two infinite regex solution sets don't intersect

Unofficial library for PHP:

https://github.com/KendallHopkins/FormalTheory

Edit: Adding code snippet to check intersections using Kendall Hopkins library:

function doRegexIntersection($regex_string_1, $regex_string_2) {
    $lexer = new FormalTheory_RegularExpression_Lexer();
    $nfa1 = $lexer->lex( $regex_string_1 )->getNFA();
    $nfa2 = $lexer->lex( $regex_string_2 )->getNFA();
    return FormalTheory_FiniteAutomata::intersection( $nfa1, $nfa2 )->validSolutionExists();
}
Community
  • 1
  • 1
NublicPablo
  • 959
  • 11
  • 21
  • By _intersection_, do you mean that two regular expression possibly could match the same text ? If so, does that mean there doesn't exist a string that would be matched by one and not the other ? I guess the final question is what can this information get you ? –  Nov 18 '14 at 17:12
  • Yes, if the intersection is a valid automaton that means that there is at least 1 string that is matched with both regular expressions. If intersection is empty (ie, intersection automaton can't reach an end state), then such string does not exist. I don't fully understand your last question. This is the solution I was looking for, I had two objects with their behaviour defined by a regex, and I needed to check whether they are disjoint or not. – NublicPablo Nov 18 '14 at 18:14
  • So if there is at least 1 string matched by both, does that mean there are other strings matched by 1 and not the other? –  Nov 18 '14 at 19:07
  • 1
    Not necessarily. You can have, for example, two equivalent regex, they both would match exactly the same subset of strings. Or you could have one regex contained in the other. Imagine `e1 = ^a.*$` and `e2 = ^.*$`, then there exists a string matched by both (for example `aa`), but you can't find a string matched by `e1` that is not matched by `e2`. – NublicPablo Nov 20 '14 at 09:02
  • Ok, given that if two regex can reach a common end state, what optimization's could you do with that info when it is possible that one matches a string and one doesn't? –  Nov 20 '14 at 19:46
  • I'm afraid that's out of the scope of this question, and I don't know the answer. If optimizing anything, this question is about optimizing the way we find the intersection between two regex. Not using that info to do something else. – NublicPablo Nov 21 '14 at 14:12
  • Its a nice exercise. In the end though, of what practical use is it. –  Nov 21 '14 at 17:24