4

In the below input string i want to replace the "item" with "replaced_item" based on the regex search condition.

re.findall(r"(\bsee\b|\bunder\b|\bin\b|\bof\b|\bwith\b|\bthis\b)( *.{0,4})(item)","i have many roof item in the repeat item of the item inthe item downunder. with any item")

gives output:

 [('of', ' the ', 'item'), ('with', ' any ', 'item')]

I want to replace the "item" keyword in the above matched phrases to "replaced_items".

Expected output: i have many roof item in the repeat item of the replaced_item inthe item downunder. with any replaced_item
Apoorv
  • 177
  • 1
  • 3
  • 15
  • 1) Use raw string literals to define a regex. Also, `findall` will return the *captured* submatches only. 2) Unclear what you mean, please post the failing code. – Wiktor Stribiżew Jul 19 '17 at 12:52
  • You need to use a raw string literal for your regex. i.e. `re.findall(r"(\bsee\b...")` otherwise the backslashes are treated as control characters. – tzaman Jul 19 '17 at 12:52
  • Thank you @WiktorStribiżew. raw string literals worked. I have edited the question to make it more clearer. – Apoorv Jul 19 '17 at 13:03
  • 1
    Ok, so you need to use `re.sub`, not `re.findall`? Or do you want to run these 2 operations separately? Or get all in 1 go? :) It seems you just need a `re.sub` with your current pattern and a `r'\1\2replaced_item'` replacement. See https://regex101.com/r/0Tc9c9/1 – Wiktor Stribiżew Jul 19 '17 at 13:06
  • It worked fine. "\1\2replaced_item " is what i was looking for. Most of the examples speaks about the $1$2 from the Perl. Thanks a lot for the quick help. – Apoorv Jul 19 '17 at 13:19

1 Answers1

2

You may get the expected output with a \1\2replaced_item replacement string:

import re
pat = r"\b(see|under|in|of|with|this)\b( *.{0,4})(item)"
s = "i have many roof item in the repeat item of the item inthe item downunder. with any item"
res = re.sub(pat, r"\1\2replaced_item", s)
print(res)

See the Python demo

Also, note how word boundaries are now restricting the context for the words inside the alternation (since they are moved out, only 1 word boundary is required at both ends).

Just a note: if replaced_item is a placeholder, and can start with a digit, you should use r'\1\g<2>replace_item'. The \g<2> is an unambiguous backreference notation, see python re.sub group: number after \number SO post.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563