regular expression in pandas str replace to exclude partly match

Question

I am trying to replace 'hi' and 'hello' with 111 but get stuck with pandas.str.replace(). any suggestions? thanks!

a1 = pd.Series('12:04:25 Roberts: Hi, Hello, hi this hi')


## it will replace 'this' too using the re below
a1.str.replace('(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: 111, 111, 111 t111s 111

## if I set '^hi$' then 'Hi' will be keeped
a1.str.replace('(hello|^hi$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi

## taking space and comma into consideration still the same
a1.str.replace('(hello|^\s?hi,?$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi

Try word boundaries: `a1.str.replace(r'\b(hello|hi)\b', '111', regex=True, flags=re.IGNORECASE)` — anubhava, Sep 10 '21 at 06:28
Unless clarified, your only issue is missing word boundaries. `this` contains `hi`, and `hi` pattern matches anywhere inside a string. Adding word boundaries restricts the context where a match can occur, here, only in between letters, digits and `_`. This is a common and widely covered issue on SO, and there is a canonical answer. — Wiktor Stribiżew, Sep 10 '21 at 07:16

score 0 · Answer 1 · answered Sep 10 '21 at 05:43

0

You could try adding a lookbehind:

>>> a1.str.replace('(?<=\s|,)(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
0    12:04:25 Roberts: 111, 111, 111 this 111
dtype: object
>>>

answered Sep 10 '21 at 05:43

U13-Forward

69,221
14
89
114

regular expression in pandas str replace to exclude partly match

1 Answers1