0

I wanted to replace all 'A' in the middle of string by '*' using regex in python I tried this

re.sub(r'[B-Z]+([A]+)[B-Z]+', r'*', 'JAYANTA ')

but it outputs - '*ANTA '

I would want it to be 'J*Y*NTA'

Can someone provide the required code? I would like an explanation of what is wrong in my code if possible.

Aditya Guru
  • 646
  • 2
  • 10
  • 18

3 Answers3

2

Using the non-wordboundary \B.
To make sure that the A's are surrounded by word characters:

import re
str = 'JAYANTA POKED AGASTYA WITH BAAAAMBOO '
str = re.sub(r'\BA+\B', r'*', str)
print(str)

Prints:

J*Y*NTA POKED AG*STYA WITH B*MBOO 

Alternatively, if you want to be more specific that it has to be surrounded by upper case letters. You can use lookbehind and lookahead instead.

str = re.sub(r'(?<=[A-Z])A+(?=[A-Z])', r'*', str)
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • It's because your code also got the characters surrounding the A into the match. And thus replaced those with nothing. That's where the lookarounds are handy, because those don't get included into the match, it's just a check if it's there. – LukStorms Apr 09 '17 at 11:32
  • The reason your regex gave '*ANTA ' as output is because after replacing 'JAY' with * there was no other match. – LukStorms Apr 09 '17 at 11:36
1
>>> re.sub(r'(?!^)[Aa](?!$)','*','JAYANTA')
'J*Y*NTA'

My regex searches for an A but it cannot be at the start of the string (?!^) and not at the end of the string (?!$).

Casper
  • 1,435
  • 10
  • 22
  • Replace all the `A`s inside the string, not at the start or end of it: *I wanted to replace all 'A' **in the middle of string** by '*'* – Mr. Xcoder Apr 08 '17 at 10:22
  • There is a space at the end of the string. In this case, your regex replaces the last A. – Toto Apr 08 '17 at 10:38
1

Lookahead assertion:

>>> re.sub(r'A(?=[A-Z])', r'*', 'JAYANTA ')
'J*Y*NTA '

In case if word start and end with 'A':

>>> re.sub(r'(?<=[A-Z])A(?=[A-Z])', r'*', 'AJAYANTA ')
'AJ*Y*NTA '
Hackaholic
  • 19,069
  • 5
  • 54
  • 72