-1

I'm trying to use Python's raw notation to find a pattern that includes special characters with no success.

When using the 'r' notation to ignore the special characters nothing is found - see the example below:

Problematic Code

import re
pattern = re.compile(r"testing+101@gmail.com")
sentence = '___dsdtesting+101@gmail.comaaa___'

result = re.search(pattern, sentence).group()

print(result)

The above code will not find the pattern and return

AttributeError: 'NoneType' object has no attribute 'group'

Working Code

When escaping the '+' with '\' it works as expected:

import re
pattern = re.compile("testing\+101@gmail.com")
sentence = '___dsdtesting+101@gmail.comaaa___'

result = re.search(pattern, sentence).group()

print(result)

The above code will return the desired outcome of "testing+101@gmail.com".

Am I using the raw notation wrong? What's going on?

TO CLARIFY: I am not interested in escaping with the '\', rather I want to use the raw notation.

shanik1986
  • 63
  • 5
  • 3
    You have to escape the `\+` or else you would match 1 or more times a `g` – The fourth bird May 30 '20 at 08:29
  • 3
    You have to escape`.` too; something like `\.` –  May 30 '20 at 08:32
  • Also, do you just want to test if your pattern is in the string? If so, don't use regex, but look at `in` operator. – JvdV May 30 '20 at 08:59
  • 1) I don't need to escape the '.' (see the working code example) 2) I don't want to escape with '\'. I want to use the raw notation, i.e. r"Raw+Notation". – shanik1986 May 30 '20 at 09:00
  • Raw strings ignore *python* escape sequence. They do not know anything about regex. – MisterMiyagi May 30 '20 at 09:12
  • It is a common misunderstanding. There is no raw notation, there are *string literals* of various types, raw string literal being one of them. String literals are used to define literal texts in code. When you define them manually. When you use variables, you can't "make them raw". Because they have already been defined. You need to use `re.escape` to use a part of literal text inside a regular expression. – Wiktor Stribiżew May 30 '20 at 10:08

1 Answers1

0

There are two levels of special characters here — those that are special to Python’s string syntax, and those that are special in regular expressions. Using raw strings takes care of the first group, but not the second group.

The plus sign is special in regexes, so to match the string a+ you need the regex a\+. Because the backslash is special to Python strings, if you do not use raw strings you need to type this as 'a\\+'. Using raw strings lets you type r'a\+'.

(Because the sequence \+ does not mean anything special to Python, and Python leaves such sequences unchanged, you could actually get away with just 'a\+'.)

Ture Pålsson
  • 6,088
  • 2
  • 12
  • 15