0

What does this pattern (?<=\w)\W+(?=\w) mean in a Python regular expression?

#l is a list 
print(re.sub("(?<=\w)\W+(?=\w)", " ", l))
Blckknght
  • 100,903
  • 11
  • 120
  • 169
Deyaa
  • 113
  • 5
  • 10
  • 1
    Are you *sure* `l` is a list? If so, that call can't possibly work. The third argument to `re.sub` is supposed to be a single string, which is what the substitution works on. – Blckknght Aug 08 '21 at 00:36
  • yes, it is a string. I was wrong. thanks – Deyaa Aug 08 '21 at 13:19

2 Answers2

4

Here's a breakdown of the elements:

  • \w means an alphanumeric character
  • \W+ is the opposite of \w; with the + it means one or more non-alphanumeric characters
  • ?<= is called a "lookbehind assertion"
  • ?= is a "lookahead assertion"

So this re.sub statement means "if there are one or more non-alphanumeric characters with an alphanumeric character before and after, replace the non-alphanumeric character(s) with a space".

And by the way, the third argument to re.sub must be a string (or bytes-like object); it can't be a list.

sj95126
  • 6,520
  • 2
  • 15
  • 34
2

Just put it into a site like regex101.com and hover the cursor over the parts.

It would match non-word chars between word chars. Bits between the last 'd' of 'word' and the first 'w' of 'word' from the string below as an example...

word^&*((*&^%$%^&*& ^%$£%^&**&^%$£!"£$%^&*()word

Example:

import re

#if it is a list...
l = ['John Smith', 'This%^&*(string', 'Never!£$Mind^&*I$?/Solved{}][]It']

#l is a list 
print(re.sub(r"(?<=\w)\W+(?=\w)", " ", l[2]))

Never Mind I Solved It
MDR
  • 2,610
  • 1
  • 8
  • 18