0

I'm trying to find all the characters that are required in the regex (it will never match without them). I would like all characters to be returned in a list. For example (a)+ would return ["a"] and c(b)+ would return ["c","b"]. However, [ab]+ would return [] because either character is matched, but both don't need to be.

I tried

def reg_chars(reg):
    import re
    chars = list( '!"#$%&\'()*,-./0123456+789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~')
    for char in chars:
        if re.match(reg,char):
            return char

however it is very inefficient, can only match one character, and never works if there is anything like [ab]+ in the regex.

  • How do you know that is more inefficient than it has to be? – Scott Hunter Jun 22 '23 at 18:30
  • @ScottHunter It isn't really now, but this code would be too slow for a longer regex because it is just looping. – Alex Defanano Jun 22 '23 at 18:35
  • "too slow" is a judgement; there is a limit to how fast any particular problem can be solved, regardless of how fast you *want* it to be solved. – Scott Hunter Jun 22 '23 at 18:37
  • @ScottHunter true, but there must be a more efficient way to do this, is what I mean – Alex Defanano Jun 22 '23 at 18:39
  • 1
    You might find this interesting: https://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python Unfortunately, it seems in newer Python version, the `re.sre_parse` module no longer exists – user2390182 Jun 22 '23 at 18:41
  • 1
    How perfect does this have to be? A regex can become rather complex with escaped special characters and the like. If you reduce the features you want to allow, e.g. allow +, *, ?, [], [^], and no escaped special characters, this might become feasible and still suit your needs. – user2390182 Jun 22 '23 at 18:46
  • Would an accurate paraphrase of your problem be "find the set of characters that appear in every string matched by a given regex"? – Scott Hunter Jun 22 '23 at 18:49
  • @user2390182 that should be fine. – Alex Defanano Jun 22 '23 at 18:51
  • @ScottHunter That seems pretty accurate. – Alex Defanano Jun 22 '23 at 18:52
  • 2
    You would need to write a regex parser. That is not a simple thing. – trincot Jun 22 '23 at 19:20
  • 1
    This is sounding like an NP problem. – Scott Hunter Jun 22 '23 at 19:40
  • I agree with trincot. It's hard to imagine something short of a parser, to cope with complex expressions. Consider following examples `(?!a)a`, `(?=a).*`, `(?=[e-o])[a-e]`, `(?=[^e-o])[e-p]`, `[ab]((?<=a)c|(?<=b)a)`, `(.*)(?(1)c)` – markalex Jun 22 '23 at 20:36
  • I think you wish to find the shortest string matched by the regex and unique(string) on it. To find a string matched by a regex is an NP-hard problem to start with. – hacker315 Jun 24 '23 at 13:22

0 Answers0