6
>>> base64_encode = lambda url : url.encode('base64').replace('\n', '')
>>> s = '<A HREF="http://www.google.com" ID="test">blah</A>'
>>> re.sub(r'(?<=href=")([\w:/.]+)(?=")', base64_encode(r'\1'), s, flags=re.I)
<A HREF="XDE=" ID="test">blah</A>

The base64 encoding of the string http://www.google.com is aHR0cDovL3d3dy5nb29nbGUuY29t not XDE=, which is the encoding of \1.

How do I pass the captured group into the function?

Frank Epps
  • 569
  • 1
  • 7
  • 21

2 Answers2

12

You pass a function to re.sub and then you pull the group from there:

def base64_encode(match):
    """
    This function takes a re 'match object' and performs
    The appropriate substitutions
    """

    group = match.group(1)
    ... #Code to encode as base 64
    return result

re.sub(...,base64_encode,s,flags=re.I)
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Ha ha you Pythoners are too fast. Was writing this answer :) +1 – Aamir Rind Jun 16 '13 at 18:01
  • What if my function has more than one parameter? – aberger Apr 27 '17 at 18:07
  • 2
    @aberger -- re will pass a single parameter (the match) to the callback. If you want to pass a second argument that is only dependent on the match (or is constant), you can wrap the first callable with a `lambda` or a simple function defined on the previous line: `re.sub(..., lambda m: callback(m, 'another_arg'), s, flags=re.I)` – mgilson Apr 27 '17 at 21:59
4

Write your function to take a single parameter, which will be a match object (see http://docs.python.org/2.7/library/re.html#match-objects for details on these). Inside your function, use m.group(1) to get the first group from your match object m.

And when you pass the function to re.sub, don't use parentheses:

re.sub("some regex", my_match_function, s, flags=re.I)
rmunn
  • 34,942
  • 10
  • 74
  • 105