0

I'm trying to recursively replace a string with another string in Python. I'm aware of this thread, but coming from other languages I'm amazed — is it really that hard? Is there no way of using a one liner to do this?

astring="<li><a href="#Quick Start">Quick Start*</li></li>
<li><a href="#Parsing a Document">Parsing a Document*</li></li>
<ul>
<li><a href="#Parsing HTML">Parsing HTML*</li></li>
<li><a href="#Parsing XML">Parsing XML*</li></li>"

tweaked = re.sub(r"\*",r"</a>", astring)

I thought the r meant recursive but it doesnt seem to do it here. Is it really this hard for a simple replace?

I've tried string.replace - which didn't work, I think its the newlines maybe? Then I tried string.translate which wanted the same number of characters in the replace string which didn't work for this example and was too many lines of code. I've tried numerous versions of this, what am I doing wrong?

Maybe I misunderstand recursive? I thought it meant 'not one match, keep going till the end' sort of thing? I want to replace the * with the </a>. the astring part is just an example and not the actual string I'm trying to replace as it's huge. (please also excuse my newbness)

PLEASE VOTE THIS QUESTION DOWN TO OBLIVION

Community
  • 1
  • 1
  • 1
    Do you want to replace all asterisk symbols with ''? I didn't get what part of it is 'recursive'. – KL-7 Nov 26 '11 at 12:40
  • The word "recursive" doesn't seem to make any sense here. Could you please clarify your question? – Sven Marnach Nov 26 '11 at 12:41
  • 1
    Btw, using `str` as a name for a variable in Python is not a good ideas as it's also name of built-in [`str`](http://docs.python.org/library/functions.html#str) function. Another question is whether `str` is defined in your code exactly as in the code snippet above. If it is then it needs some care about spanning string literal across multiple lines and escaping double quotes inside it. To solve both problems at once just use """ (triple-quotes, see [docs](http://docs.python.org/tutorial/introduction.html#strings) for more info). – KL-7 Nov 26 '11 at 12:55
  • The string is invalid, as it contains unescaped quotes. Start by replacing all the "" inside by '' – Óscar López Nov 26 '11 at 12:57
  • ok, yeah but imagine the string contains that after escaping, thats just an example. didnt think of that, sorry! edited above to clarify myself... its not the string thats the problem its the regex command...thanks guys. –  Nov 26 '11 at 12:59
  • No problem, you're welcome. But next time you "think 'r' means recursive" don't bother yourself with thinking and go straight to the documentation. And "recursive" is derivative from ["recursion"](http://en.wikipedia.org/wiki/Recursion) that is something a bit different then just global replacement. – KL-7 Nov 26 '11 at 13:08
  • you know what though, if you're a beginner the Python standard documentation, although quite comprehensive, is awful to understand, its completely confusing and I always end up looking for other tutorials written for humans. In this case, I'm just an idiot :) –  Nov 26 '11 at 13:52

3 Answers3

2

There are a few things to note:

  1. The string is not valid python syntax. It is defined with single double-quotes, yet there are double-quotes within it. Either define the string with single quotes str = 'blah blah' or use triple quotes str = """blah blah"""

  2. str is the name of a built-in function. It is good practice not to override built-ins (though it is allowed).

  3. r"" defines a 'raw string'. See docs.

  4. re.sub() does replace all non-overlapping matching sequences in the string, which is what you want. In simple cases, string.replace() should be preferred, for example mystring.replace('*', '</a>'). See docs.

Given these points, this code:

import re

mystring = '''<li><a href="#Quick Start">Quick Start*</li></li>
<li><a href="#Parsing a Document">Parsing a Document*</li></li>
<ul>
<li><a href="#Parsing HTML">Parsing HTML*</li></li>
<li><a href="#Parsing XML">Parsing XML*</li></li>'''

mynewstring = re.sub(r'\*', '</a>', mystring)
print(mynewstring)

will produce following output:

<li><a href="#Quick Start">Quick Start</a></li></li>
<li><a href="#Parsing a Document">Parsing a Document</a></li></li>
<ul>
<li><a href="#Parsing HTML">Parsing HTML</a></li></li>
<li><a href="#Parsing XML">Parsing XML</a></li></li>

Note that the forward-slash in the replace string </a> does not need to be escaped. However, the pattern '*' does need to be escaped to make it a valid regex.

KL-7
  • 46,000
  • 9
  • 87
  • 74
Rob Cowie
  • 22,259
  • 6
  • 62
  • 56
  • thanks :) Did not know r meant raw, I shall read further, but this and string.replace *do not* work on my string, I already tried that before I posted. –  Nov 26 '11 at 13:21
  • The code above _does_ work as displayed (well, with `import re`). Perhaps you could make your code available? Shove it in a gist (https://gist.github.com/) or some other pastebin. – Rob Cowie Nov 26 '11 at 13:26
  • right... will do... edit: really stupid embarrassing mistake. god can come along and swallow me and this thread up no problem. –  Nov 26 '11 at 13:28
  • And what was the problem? Btw, I think for this particular task `s.replace('*', '')` is better then using regexp. – KL-7 Nov 26 '11 at 13:34
  • the problem was it was in a function and was printing out the wrong string, so i tried about 100 different changes and nothing worked, unsurprisingly. yes replace is definitely better, thanks for your help and sorry to be such an idiot. /me hands head in shame –  Nov 26 '11 at 13:40
  • @cigar: Ha. Don't worry bout it. That kind of thing happens to all of us. – Rob Cowie Nov 26 '11 at 13:55
  • thanks :) dont know what I would do without you guys. probably still be tapping away on string.replace scratching my head. –  Nov 26 '11 at 14:34
1

Taking into account the suggestions in the comments, here's a possible solution:

string = """<li><a href="#Quick Start">Quick Start*</li></li>
<li><a href="#Parsing a Document">Parsing a Document*</li></li>
<ul>
<li><a href="#Parsing HTML">Parsing HTML*</li></li>
<li><a href="#Parsing XML">Parsing XML*</li></li>"""

string = string.replace("*", "</a>")
print string
Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • 1
    thankyou, seems i was just being a massive newb. only been doing it a few days, sorry! –  Nov 26 '11 at 13:33
1

In Python r'' and r"" denote raw strings. Within a raw string, no backslash interpretation is done.

The following seems to work pretty well:

foo="""<li><a href="#Quick Start">Quick Start*</li></li>
<li><a href="#Parsing a Document">Parsing a Document*</li></li>
<ul>
<li><a href="#Parsing HTML">Parsing HTML*</li></li>
<li><a href="#Parsing XML">Parsing XML*</li></li>"""

foo = foo.replace('*', '</a>')
jsbueno
  • 99,910
  • 10
  • 151
  • 209
Vatine
  • 20,782
  • 4
  • 54
  • 70
  • does this replace stop at newlines perhaps ? I tried this already but it didn't work for some reason. any idea why ? –  Nov 26 '11 at 13:24
  • It worked when I tested it in a python REPL just now. Bear in mind that the newlines will be represented as `"\n"` when you're looking at just the return value. To see the newlines more clearly, try `print foo.replace...` – Vatine Nov 26 '11 at 13:26
  • OH! serious doh moment. why... oh no. i mean, thanks dude or dudette, it seems i cant replace the same variable with itself. omg thats really stupid of me. thanks for that little gem. when i print it - i can see the replacement but before i was trying to do string = string.replace(blah) - and that didnt display it. –  Nov 26 '11 at 13:30
  • i suppose the next question is, why can i replace a variable with itself in regex commands but not string.replace? –  Nov 26 '11 at 13:34
  • IGNORE THAT, im even stupider than i thought, which recursively speaking, is not that surprising i suppose –  Nov 26 '11 at 13:36