0

I know how Python use "r" as the raw string notation in regular expression:

However, I'd like to apply that in a while loop like:

while i < len(organized_texts) and j < len(frag_texts):
    if re.match(frag_texts[j], organized_texts[i]):
        # If frag_texts[j] matches the beginning of organized_texts[i]
        # Do things

The problem is that frag_texts[j] can contain literal "(" and that's where re.match(frag_texts[j], organized_texts[i]) blows up with error: missing ), unterminated subpattern at position 2.

Apparently I can do neither rfrag_texts[j] nor \frag_texts[j]. I've tried re.match("r'{}'".format(frag_texts[j]), organized_texts[i]) but it gives me the same error too. What options do I have now?

ytu
  • 1,822
  • 3
  • 19
  • 42
  • Is `frag_texts` a list of pattern strings from your source code, or a list of pattern strings you loaded out of a file or input from the user or something similar? – abarnert Apr 03 '18 at 03:23
  • 2
    If your strings contain unmatched parentheses, then they *aren't regular expressions*, and you should stop treating them as such. Try `if organized_texts[i].startswith(frag_texts[j]):` instead. – jasonharper Apr 03 '18 at 03:25
  • 1
    You can use [`re.escape`](https://docs.python.org/3/library/re.html#re.escape) to turn escape any special characters in a string to use it in a regex. But if you're going to escape the entire pattern, there's no reason to use `re` in the first place—just do `if frag_texts[j] in organized_texts[i]:` or `if organized_texts[i].startswith(frag_texts[j]):` or some other simple string operation. If you have a pattern that's made by filling in a template in a raw string literal with user strings, then you want to `re.escape` the user strings. – abarnert Apr 03 '18 at 03:25

1 Answers1

2

Raw strings aren't a different data type - they are just an alternative way to write certain strings, making it less complex to express literal string values in your program code. Since regular expressions often contain backslashes, raw strings are frequently used as it avoids the need to write \\ for each backslash.

If you want to match arbitrary text fragments then you probably shouldn't be using regular expressions at all. I'd take a look at the startswith string method, since that just does a character-for-character comparison and is therefore much faster. And there's also the equivalent of re.search, should you need it, using the in keyword.

You might be interested in this article by a regular expression devotee. Regular expressions are indeed great, but they shouldn't be the first tool you reach for in string matching problems.

If it became necessary for some reason to use regexen than you 'd be interested in the re.escape method,, which will quote special characters so they are interpreted as standard characters rather than having their standard regex meaning.

0m3r
  • 12,286
  • 15
  • 35
  • 71
holdenweb
  • 33,305
  • 7
  • 57
  • 77