Python: Find and print previous element

Question

I have the following text (just an example): </i>5 <i></i><span class

I'd like to remove this space, so I tried:

re.sub(r'</i>.* <i></i><span class', '</i>%02d<i></i><span class' %, text)

But this did not work. How can I catch the "thing" which is found in ".*"? %02d is obviously incorrect...

Thanks for the help :)

Requisite link to another post about [parsing HTML with regular expressions](http://stackoverflow.com/a/1732454/418413) — kojiro, Sep 06 '13 at 02:36

score 1 · Accepted Answer · answered Sep 06 '13 at 02:35

You can use a capturing group:

re.sub(r'</i>(.*) <i></i><span class', r'</i>\1<i></i><span class', text)

This capturing group, (.*), captures the "5", and it is placed in the \1 in the replacement text. Note the presence of r before the second string: that tells Python it's a raw string (see here for more details)

score 0 · Answer 2 · edited May 23 '17 at 12:28

As David mentioned, a capturing group is what you need. To elaborate further:

Round brackets capture whatever it is that they match to. This is called a 'capturing group', and a 'backreference' is created to whatever is caught. Each subsequent backreference can be referred to by \1. So:

(.)b\1

matches 'aba' and 'mnm', but not 'abc'.

Similarly,

(.)(.)b\1\2

matches 'abbab', 'xybxy'

and

(.)(.)b\2\1

matches 'abbba', 'xybyx'

This can be then used to check a palindrome (Not that it's advised, regex's cannot match palindromes of unspecified length limits):

(.?)(.)(.)\3?\2\1

is a regex which will match a palindrome of length 3 or less.

Python: Find and print previous element

2 Answers2