1

Let's say I have this string :

<div>Object</div><img src=#/><p> In order to be successful...</p>

I want to substitute every letter between < and > with a #.

So, after some operation, I want my string to look like:

<###>Object<####><##########><#> In order to be successful...<##>

Notice that every character between the two symbols were replaced with # ( including whitespace).

This is the closest I could get:

   r = re.sub('<.*?>', '<#>', string)

The problem with my code is that all characters between < and > are replaced by a single #, whereas I would like every individual character to be replaced by a #.

I tried a mixture of various back references, but to no avail. Could someone point me in the right direction?

buydadip
  • 8,890
  • 22
  • 79
  • 154

2 Answers2

3

What about...:

def hashes(mo):
    replacing = mo.group(1)
    return '<{}>'.format('#' * len(replacing))

and then

r = re.sub(r'<(.*?)>', hashes, string)

The ability to use a function as the second argument to re.sub gives you huge flexibility in building up your substitutions (and, as usual, a named def results in much more readable code than any cramped lambda -- you can use meaningful names, normal layouts, etc, etc).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Let's say I have `
    >`, not all characters in between the two primary symbols will be replaced, I would have `<###>>`, could you kindly edit? I'll edit my question.
    – buydadip Jan 19 '15 at 03:35
  • Your '<(.*?)>` matches only the `'
    '` part in `'
    >'`, so of course the extra '`>`' does not participate in the substitution. Maybe you want `<([^<]*>`? I'm only addressing the **substitution** part in the A so far. I see you've edited the Q, let's see...
    – Alex Martelli Jan 19 '15 at 03:41
  • Yes, for the string I am currently using, you're initial answer works perfectly... probably shouldn't have added the **EDIT** part in my question, and just left it that. I realized afterwards how impossible what I wanted really is... in fact I think I'll unedit. – buydadip Jan 19 '15 at 03:58
2

The re.sub function can be called with a function as the replacement, rather than a new string. Each time the pattern is matched, the function will be called with a match object, just like you'd get using re.search or re.finditer.

So try this:

re.sub(r'<(.*?)>', lambda m: "<{}>".format("#" * len(m.group(1))), string)
Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • The answer is good, but it doesn't replace everything in between the two symbols if, say, I have `
    >`. I'll edit my question to include this.
    – buydadip Jan 19 '15 at 03:32
  • Yes, well, that's one of the reasons why you [shouldn't parse HTML or XML with regular expressions](http://stackoverflow.com/a/1732454/1405065). – Blckknght Jan 19 '15 at 03:38