5

I cannot understand the following output :

import re 

re.sub(r'(?:\s)ff','fast-forward',' ff')
'fast-forward'

According to the documentation :

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.

So why is the whitespace included in the captured occurence, and then replaced, since I added a non-capturing tag before it?

I would like to have the following output :

' fast-forward'
plalanne
  • 1,010
  • 2
  • 13
  • 30

2 Answers2

7

The non-capturing group still matches and consumes the matched text. Note that consuming means adding the matched text to the match value (memory buffer alotted for the whole matched substring) and the corresponding advancing of the regex index. So, (?:\s) puts the whitespace into the match value, and it is replaced with the ff.

You want to use a look-behind to check for a pattern without consuming it:

re.sub(r'(?<=\s)ff','fast-forward',' ff')

See the regex demo.

An alternative to this approach is using a capturing group around the part of the pattern one needs to keep and a replacement backreference in the replacement pattern:

re.sub(r'(\s)ff',r'\1fast-forward',' ff')
         ^  ^      ^^ 

Here, (\s) saves the whitespace in Group 1 memory buffer and \1 in the replacement retrieves it and adds to the replacement string result.

See the Python demo:

import re 
print('"{}"'.format(re.sub(r'(?<=\s)ff','fast-forward',' ff')))
# => " fast-forward"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

A non-capturing group still matches the pattern it contains. What you wanted to express was a look-behind, which does not match its pattern but simply asserts it is present before your match.

Although, if you are to use a look-behind for whitespace, you might want to consider using a word boundary metacharacter \b instead. It matches the empty string between a \w and a \W character, asserting that your pattern is at the beginning of a word.

import re

re.sub(r'\bff\b', 'fast-forward', ' ff') # ' fast-forward'

Adding a trailing \b will also make sure that you only match 'ff' if it is surrounded by whitespaces, not at the beginning of a word such as in 'ffoo'.

See the demo.

Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73