1

I was wondering if there is a way to pass a pattern with a group which can be replaced with what I need.

For example pattern <table class="(old_class)"> to be replaced with <table class="new_class"> without repeating the whole thing.

I can do

>>> re.sub(r'(.*?)regular expressions(.*?)', r'\1everything\2', 'I like regular expressions in Python.')
>>> 'I like everything in Python.'

I am looking for an elegant way to replace text matched by capture groups. I think this a common pattern and there should exist a built-in way for this instead of making my own function.

P.S. HTML is just an example.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • http://stackoverflow.com/questions/2073541/search-and-replace-in-html-with-beautifulsoup ? – knh170 Mar 31 '16 at 08:49
  • You should use an Html Parser instead of regex. – styvane Mar 31 '16 at 08:50
  • `re.sub(r'(
    – Avinash Raj Mar 31 '16 at 08:50
  • or, `re.sub(r'(]*\bclass=")[^"]*"', r'\1new_class"', s)`
    – Avinash Raj Mar 31 '16 at 08:51
  • I believe you want to replace a group and save it for later use, right? If HTML is used for example only, please think of a better example: HTML and regex is a dangerous combination. Also, see similar [*How to substitute into a regular expression group in Python*](http://stackoverflow.com/questions/3059151/how-to-substitute-into-a-regular-expression-group-in-python) and [*Find and sub in one line using Python re*](http://stackoverflow.com/a/33962681/3832970) – Wiktor Stribiżew Mar 31 '16 at 08:53
  • @AvinashRaj yes, that's what I do, but this looks not too readable. Also, sometimes you have to use `\1` and `\2`. So I wonder if there is a reversed way to do this - not backreferencing text matched by capture groups, but replacing it nicely. I can do a special function for this, but I think there should exist a built-in usable way for this. P.S. HTML is just an example. – warvariuc Mar 31 '16 at 09:04
  • I have voted to close my own question, as I feel it's very subjective. I don't want to delete it -- others can find useful the implementation. – warvariuc Mar 31 '16 at 09:14
  • You can even use `re.sub(r'regular\s+expressions', r'everything', 'I like regular expressions in Python.')` - much simpler. – Wiktor Stribiżew Mar 31 '16 at 11:19
  • @WiktorStribiżew it's doesn't work when you need to replace `regular expressions` with `everything` only when inside `I love *** in Python` -- that's the use case. – warvariuc Mar 31 '16 at 15:26
  • I am afraid I have no clue what you mean. There is no `regular expressions` in `I love *** in Python`. Maybe you need to replace all spaces with hyphens inside parentheses in `I like (not so regular) expressions`? See [`re.sub(r"\([^()]*\)", lambda m: m.group().replace(" ", "-"), "I like (not so regular) expressions")`](http://ideone.com/5H7s3L). – Wiktor Stribiżew Mar 31 '16 at 16:38
  • @WiktorStribiżew You suggested using `re.sub(r'regular\s+expressions', r'everything', 'I like regular expressions in Python.') `. This will replace all occurrences of 'regular expressions'. I need to replace only those surrounded by "I like" and "Python". – warvariuc Mar 31 '16 at 16:46

0 Answers0