-1

I am writing an online regex checker, which takes input from the user in the form of a pattern, and flags, and uses that to compile a regex object. The regex object is then used to check if the test string matches within the format provided by the regex pattern or not. As of this moment, the compile function looks like this:

class RegexObject:
    ...

    def compile(self):
        flags = ''
        if self.multiline_flag:
            flags = re.M
        if self.dotall_flag:
            flags |= re.S
        if self.verbose_flag:
            flags |= re.X
        if self.ignorecase_flag:
            flags |= re.I
        if self.unicode_flag:
            flags |= re.U

        regex = re.compile(self.pattern, flags)
        return regex

Please note, the self.pattern and all the flags are class attributes defined by the user using a simple form. However, one thing I noticed in the docs is that there is usually an r before the pattern in the compile functions, like this:

re.compile(r'(?<=abc)def')

How do I place that r in my code before my variable name? Also, if I want to tell the user if the test input is valid or not, should I be using the match method, or the search method?

Thanks for any help.

Edit: This question is not a duplicate of this one, because that question has nothing to do with regular expressions.

darkhorse
  • 8,192
  • 21
  • 72
  • 148
  • in short - you cant (afaik). Your users need to input the correct strings - or you have to [re.escape()](https://docs.python.org/3/library/re.html#re.escape) it if they mean all things literally so its gets doubled up and escaped properly – Patrick Artner Nov 18 '18 at 20:14

2 Answers2

1

Don't worry about the r, you don't need it here.

The r stands for "raw", not "regex". In an r string, you can put backslashes without escaping them. R strings are often used in regexes because there are often many backslashes in regexes. Escaping the backslashes can be annoying. See this shell output:

>>> s = r"\a"
>>> s2 = "\a"
>>> s2
'\x07'
>>> s
'\\a'

And you should use search, as match only looks at the start of the string. Look at the docs.

re.search(pattern, string, flags=0)

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

re.match(pattern, string, flags=0)

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.

If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Is there any way to escape the string before putting it in the compile function? I tried `re.compile(re.escape("\a"))`, and still the output was `re.compile('\x07')`. – darkhorse Nov 18 '18 at 20:30
  • 1
    @darkhorse I probably didn't explain this well enough. You only need `r` when you are writing a string literal in code. The python interpreter will interpret backslashes differently in a normal string, which is why you need `r`. But you said the input is coming from the user. If the user types a backslash, python will know that it is literally a backslash. So don't worry. – Sweeper Nov 18 '18 at 20:33
  • Got it now, thanks for the answer. – darkhorse Nov 18 '18 at 20:38
0

You need not use r.Instead you should use re.escape.match or search again should be user input.

vks
  • 67,027
  • 10
  • 91
  • 124