How to sub a string with a capturing group that does not end with white space but might have white space after the capturing group?

Question

I am trying to replace a string that looks like this ( self, False ) to (self, False). The regex I am using:

s = re.compile('\(\s*(.*)\s*\)')
s.sub(r'(\1)', '(    self, False   )')

Which returns (self, False )

How do I capture the group inside the parentheses without the trailing white spaces?

Does this answer your question? [How to strip all whitespace from string](https://stackoverflow.com/questions/3739909/how-to-strip-all-whitespace-from-string) — Tomerikoo, Apr 06 '21 at 11:11

score 1 · Answer 1 · answered Apr 10 '19 at 02:07

1

Why not use string replace to get rid of whitespaces with empty chars

str = '(    self, False   )'
print(str.replace(' ',''))
#(self,False)

answered Apr 10 '19 at 02:07

Devesh Kumar Singh

20,259
5
21
40

I am running the script on a large chunk of text and I only want to get rid of extra white spaces after the opening and before the closing parenthesis. I don't want to replace all white spaces in the text. – masha Apr 10 '19 at 02:15
Wondering if he wanted to maintain the space in between lol. Great answer – FailSafe Apr 10 '19 at 02:15
Your expected output says `(self,False)` in the question, which doesn't have a whitespace. – Devesh Kumar Singh Apr 10 '19 at 02:16
@DeveshKumarSingh Your directions were kind of unclear, but I posted a solution below – FailSafe Apr 10 '19 at 02:17
@DeveshKumarSingh oh my bad, typed the output wrong, It would be (self, False). Basically whatever it captures between parentheses. Edited my question. – masha Apr 10 '19 at 02:20

score 1 · Answer 2 · edited Apr 06 '21 at 11:13

Try this:

#TEST 1
>>> import re
>>> str = '(    self, False   )'
>>> re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str)
#OUTPUT
'(self, False)'

#TEST 2
>>> str = '''TEbh eyendd dkdkmfkf(    self, False   ) dduddnudmd (    self, False   )
(    self, False   ) fififfj m(    self, False   )kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo'''

>>> print(re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str))
#OUTPUT
'TEbh eyendd dkdkmfkf(self, False) dduddnudmd (self, False)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo'

#TEST 3
>>> '''TEbh eyendd dkdkmfkf(    self) dduddnudmd (    self)
(    self, False   ) fififfj m(    self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo
(self   ) dndnd (self   ) fufufjiri (    self   ) (self   ) (    self)(    self)(self   )(    self   )(self   )(    self   )'''

>>>  print(re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str))

#OUTPUT
TEbh eyendd dkdkmfkf(self) dduddnudmd (self)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo
(self) dndnd (self) fufufjiri (self) (self) (self)(self)(self)(self)(self)(self)

Piggybacking off of your simple solution:

>>> '''TEbh eyendd dkdkmfkf(    self) dduddnudmd (    self)
(    self, False   ) fififfj m(    self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo
(self   ) dndnd (self   ) fufufjiri (    self   ) (self   ) (    self)(    self)(self   )(    self   )(self   )(    self   )'''

>>> print(re.sub(r'(\()\s*([\S\s]*?)\s*(\))', r'\1\2\3', str))
#OUTPUT
TEbh eyendd dkdkmfkf(self) dduddnudmd (self)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo
(self) dndnd (self) fufufjiri (self) (self) (self)(self)(self)(self)(self)(self)

The OP output says `(self,False)` in the question, which doesn't have a whitespace. — Devesh Kumar Singh, Apr 10 '19 at 02:18
@Nishat. I updated it with a 2nd version that accounts for surrounding text — FailSafe, Apr 10 '19 at 02:26

score 1 · Accepted Answer · answered Apr 10 '19 at 04:30

Found a simple solution.

s = re.compile('\(\s*(.*?)\s*\)')
s.sub(r'(\1)', 'hi hello ble ble ( self, False   ) ( self      ) (self , greedy    ) (    hello)')
#Output
'hi hello ble ble (self, False) (self) (self , greedy) (hello)'

According to python re documentation:

The '', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.> is matched against ' b ', it will match the entire string, and not just ''. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.*?> will match only ''.

Absolutely. Deselect my answer and use this. – FailSafe Apr 10 '19 at 13:49 — FailSafe, Apr 10 '19 at 13:49

How to sub a string with a capturing group that does not end with white space but might have white space after the capturing group?

3 Answers3