0

I understand that the 'r' prefix indicates a raw string, hence why in the following example is the 'r' prefix being used, since there are special regex characters in the string, which should not be taken literally?

the 'string' that is being searched is an nltk Text object, I suppose it has something to do with this? However I don't understand how it affects the usage of findall.

moby.findall(r"<a> (<.*>) <man>")
czolbe
  • 571
  • 5
  • 18

2 Answers2

3

In this particular case, r makes no difference, as this string does not contain any sequences which could be misinterpreted. However, it is a good habit to use r when writing regular expressions, to avoid misinterpretation of sequences like \n or \t; with r, they are treated literally, as two characters - backslash followed by a letter; without r, they evaluate to newline and tab, respectively.

Błotosmętek
  • 12,717
  • 19
  • 29
  • Oh, and this has nothing to do with characters that have special meaning in regexps (like *, +, ? etc.) - these need to be escaped in regexps using backslash: `\*` to be treated literally, but `r` has no effect on them. – Błotosmętek Jun 14 '17 at 08:24
1

The r preceeding the string is called a sigil.

For example, '\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n.

But for your regex:

moby.findall(r"<a> (<.*>) <man>")

it doesn't make a difference but it is always a good idea to treat regex as raw strings to avoid escaping backslashes.

Ricky Han
  • 1,309
  • 1
  • 15
  • 25