7

I would like to replace all occurrences of 3 or more "=" with an equal-number of "-".

def f(a, b):
    '''
    Example
    =======
    >>> from x import y
    '''
    return a == b

becomes

def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b        # don't touch

My working but hacky solution is to pass a lambda to repl from re.sub() that grabs the length of each match:

>>> import re

>>> s = """
... def f(a, b):
...     '''
...     Example
...     =======
...     >>> from x import y
...     '''
...     return a == b"""

>>> eq = r'(={3,})'
>>> print(re.sub(eq, lambda x: '-' * (x.end() - x.start()), s))

def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Can I do this without needing to pass a function to re.sub()?

My thinking would be that I'd need r'(=){3,}' (a variable-length capturing group), but re.sub(r'(=){3,}', '-', s) has a problem with greediness, I believe.

Can I modify the regex eq above so that the lambda isn't needed?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • 1
    I highly doubt so. Why is the lambda not sufficient? – cs95 Mar 24 '18 at 19:22
  • Possible, but not advisable. The lambda is by far the easiest and most readable solution. – Aran-Fey Mar 24 '18 at 19:29
  • 1
    And for completeness if you use the regex module, that @CasimiretHippolyte mentions there could be [one more jugglery](https://stackoverflow.com/a/24535912/5527985) but probably in most cases slower: [`(?<!=)={1,2}(?!=)(*SKIP)(*F)|=`](https://regex101.com/r/9myeWV/1) – bobble bubble Mar 24 '18 at 20:24

5 Answers5

3

With some help from lookahead/lookbehind it is possible to replace by char:

>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'
Marat
  • 15,215
  • 2
  • 39
  • 48
2

Using re.sub, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'.

print(re.sub('=(?=={2}|=?\n)', '-',  s))
def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."

=        # equal sign if
(?=={2}  # lookahead
|        # regex OR
=?       # optional equal sign
\n       # newline
)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Seems foolproof to me. But how does a fool judge foolproofness? – Brad Solomon Mar 24 '18 at 19:32
  • @BradSolomon I'd imagine this regex is limited in the sense that it would only replace titular underlines (keep that in mind, as the other solutions may generalise better). Cheers. – cs95 Mar 24 '18 at 19:41
2

It's possible, but not advisable.

The way re.sub works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s) won't work - that'll replace the entire match with a dash, not each occurence of the = character.

>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'

So if you want to avoid a lambda, you have to write a regex that matches individual = characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more = characters with the simple pattern ={3,}. It requires some use of lookarounds and looks like this:

(?<===)=|(?<==)=(?==)|=(?===)

This does what you want:

>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'

But it's clearly much less readable than the original lambda solution.

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
2

Using the regex module, you can write:

regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
  • \G is the position immediately after the last successful match or the start of the string.
  • (?!\A) forces the start of the string to fail.

The second branch =(?===) succeeds when a = is followed by two other =. Then the next matches use the first branch \G(?!\A)= until there are no more consecutive =.

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
2

The question explicitly asks for a solution that doesn't use a function, but for completeness and for someone who is looking for a clearer solution (that doesn't involve lots of regex tricks), it's possible to use a function as in Replacing a RegEx with a string of characters with the same length:

re.sub('={3,}', lambda x: '-' * len(x.group()), s)

cookiemonster
  • 1,315
  • 12
  • 19