Is there a practical way to determine whether two (Python) regular expressions are (likely to be) mutually exclusive?

Question

TL;DR: There's no generic solution, but is there something better than a simple (Python) string equality test for identical regexes?

From reading this answer, I gather that a generic solution is theoretically impossible.

However, I work on AutoKey, a Python application that compares a window's title and/or class to more than one regex to determine which of several actions to take when AutoKey is triggered by a particular trigger. See this for the actual issue.

Currently, when a new action is added, its associated regex is checked against all other actions with the identical trigger.

The test in use now is a simple string compare of the two regexs. If they are identical, the new action is not allowed to be added.

This very weak test has proven suficient most of the time, but if it fails, it results in undefined behavior - (probably) the first action discovered to match is used where the search order is implementation dependent.

Most of the regexes used in practice are either pure constant literal strings like kate.kate or things like .*app_title.* and .*ivaldi.*|.*brave.*. They only get weird when a negative window filter is needed (to match all windows except those whose title or class match the expression).

So, given that a perfect solution is unavailable is there some half way measure that will at least be better than a simple equality test?

Is there something that could estimate the likelyhood that two regexes are not mutually exclusive or would at least detect trivial things like .*string being equivalent to .*.*string

After writing this, I came up with a big improvement in the form of a purely empirical heuristic.

I still want to know if there's a more elegant solution/approach.

Note: I am not expert in any of this, so any suggested improvement has to be fairly simple to understand and implement for it to be useful.

Are you allowing arbitrary Python regexes, or are you restricting them to the subset of real regular expressions (ie., no lookaheads, non-greedy operators, etc)? — chepner, Feb 28 '23 at 14:45
Have you thought about testing those regexes to a large random text document and comparing the results (number of matches, or even better the indexes of the matches) to hopefully determine if they are mutually exclusive or not? — RifloSnake, Mar 01 '23 at 12:48
@chepner We don't really need anything fancy, but I don't believe there's currently any restriction on what the user can type in that field. Is there an easy way to do that without writing some intricate parsing? — Joe, Mar 03 '23 at 17:12
There are theoretical issues, in that unrestricted regexes recognize languages for which the problem of deciding if they are identical is undecidable. — chepner, Mar 03 '23 at 17:14
@RifloSnake That's an interesting idea. I came up with [something](https://github.com/autokey/autokey/issues/365#issuecomment-14482621850) a bit similar to that after posting this question. Maybe we can modify that to be a bit more like your idea. — Joe, Mar 03 '23 at 17:23
The hardest part of my idea is to find a proper randomized large text. If the text is good, no complex calculations are needed, the numbers will show itself if they are mutually exclusive or not.@Joe — RifloSnake, Mar 03 '23 at 17:28
@chepner. I get that. That's why I asked the question the way I did. I'm trying to make it a bit more reliable. See my comment above for what I'm thinking now. We have been getting by without anything fancy till now. The user who provides the regex is usually the one who will use the event as well, so there's really not much of an adversarial environment to deal with. — Joe, Mar 03 '23 at 17:31
@RifloSnake I'm not sure that's necessary if the user just provides a few things that should match one and not the others. I just have to try them. The input comes from window titles or classes, so, hopefully, there aren't a lot of bizarre edge cases to deal with. Usually, there will only be at most, a handful of different regexes to test against the input because most actions won't have the same triggers and so can't collide to begin with. — Joe, Mar 03 '23 at 17:45
I've worked witih regexes a bit, but nothing intricate. Maybe I can just do a pattern match on the candidate regex and if it contains certain tokens, either reject it or warn the user to review what they're doing. The problem is that some of those might conceivably be plain text parts of the window title or class, so I'd have to know if they were escaped, etc. whcih gets back to fancier parsing. — Joe, Mar 03 '23 at 17:54

Is there a practical way to determine whether two (Python) regular expressions are (likely to be) mutually exclusive?

0 Answers0