0

I have the following text (just an example): </i>5 <i></i><span class

I'd like to remove this space, so I tried:

re.sub(r'</i>.* <i></i><span class', '</i>%02d<i></i><span class' %, text)

But this did not work. How can I catch the "thing" which is found in ".*"? %02d is obviously incorrect...

Thanks for the help :)

arshajii
  • 127,459
  • 24
  • 238
  • 287
MarkF6
  • 493
  • 6
  • 21

2 Answers2

1

You can use a capturing group:

re.sub(r'</i>(.*) <i></i><span class', r'</i>\1<i></i><span class', text)

This capturing group, (.*), captures the "5", and it is placed in the \1 in the replacement text. Note the presence of r before the second string: that tells Python it's a raw string (see here for more details)

David Robinson
  • 77,383
  • 16
  • 167
  • 187
0

As David mentioned, a capturing group is what you need. To elaborate further:

Round brackets capture whatever it is that they match to. This is called a 'capturing group', and a 'backreference' is created to whatever is caught. Each subsequent backreference can be referred to by \1. So:

(.)b\1

matches 'aba' and 'mnm', but not 'abc'.

Similarly,

(.)(.)b\1\2

matches 'abbab', 'xybxy'

and

(.)(.)b\2\1

matches 'abbba', 'xybyx'

This can be then used to check a palindrome (Not that it's advised, regex's cannot match palindromes of unspecified length limits):

(.?)(.)(.)\3?\2\1

is a regex which will match a palindrome of length 3 or less.

Community
  • 1
  • 1
Siddhartha
  • 4,296
  • 6
  • 44
  • 65