0

I am in trouble for processing XML text. I want to delete () from my text as follows:

from <b>(apa-bhari(n))</b> to <b>apa-bhari(n)</b>

The following code was made

name= re.sub('<b>\((.+)\)</b>','<b>\1</b>',name)

But this can only returns

<b></b>

I do not understand escape sequences and backreference. Please tell me the solution.

phihag
  • 278,196
  • 72
  • 453
  • 469
SAKAMOTO
  • 29
  • 2
  • 3
    @ThiefMaster: Not everyone has english as his native tongue and is able to correctly and completely express what he wants to say (since english is a second language to me I've got a hard time as well quite often). – DarkDust May 15 '11 at 18:59
  • To me it sounded as if he doesn't (want to) understand the principle behind those things. – ThiefMaster May 15 '11 at 23:40

3 Answers3

2

You need to use raw strings, or escape the slashes:

name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>', name)
Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • don't use a raw string for the first string, because `\\(` is not correct, `\(` is. – bfontaine May 15 '11 at 14:06
  • 1
    @boudou - Actually, You need to escape that backslash as well - the regex engine needs to see `\(`, so the string should be `'\\('`, or `r'\('` : http://ideone.com/X3tEN – Kobi May 15 '11 at 14:08
  • Thank you! I did not know the use of raw string. – SAKAMOTO May 15 '11 at 14:24
1

You need to escape backslashes in Python strings if followed by a number; the following expressions are all true:

assert '\1' == '\x01'
assert len('\\1') == 2
assert '\)' == '\\)'

So, your code would be

name = re.sub('<b>\\((.+)\\)</b>','<b>\\1</b>',name)

Alternatively, use the regular expression string definition:

name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>',name)
phihag
  • 278,196
  • 72
  • 453
  • 469
  • No, he wants to escape parentheses, so `\(` is ok, not `\\(`. – bfontaine May 15 '11 at 14:04
  • 1
    @boudo `'\\('` is the same as `[(]`, i.e. match a parenthesis. `'\('` falls back to the default escaping. Your argument applies to regexp strings starting with `r'`. – phihag May 15 '11 at 14:09
  • Thank you very much. This can also return the results I want. – SAKAMOTO May 15 '11 at 14:12
  • @boudou Yes, that's precisely what I tried to say by mentioning it falls back to default escaping rules. – phihag May 15 '11 at 14:22
1

Try:

name= re.sub('<b>\((.+)\)</b>','<b>\\1</b>',name)

or if you do not want to have an illisible code with \\ everywhere you are using backslashes, do not escape manually backslashes, but add an r before the string, ex: r"myString\" is the same as "myString\\".

bfontaine
  • 18,169
  • 13
  • 73
  • 107