0

I am trying to replace 'hi' and 'hello' with 111 but get stuck with pandas.str.replace(). any suggestions? thanks!

a1 = pd.Series('12:04:25 Roberts: Hi, Hello, hi this hi')


## it will replace 'this' too using the re below
a1.str.replace('(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: 111, 111, 111 t111s 111

## if I set '^hi$' then 'Hi' will be keeped
a1.str.replace('(hello|^hi$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi

## taking space and comma into consideration still the same
a1.str.replace('(hello|^\s?hi,?$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi


user5843090
  • 127
  • 1
  • 7
  • 1
    Try word boundaries: `a1.str.replace(r'\b(hello|hi)\b', '111', regex=True, flags=re.IGNORECASE)` – anubhava Sep 10 '21 at 06:28
  • Unless clarified, your only issue is missing word boundaries. `this` contains `hi`, and `hi` pattern matches anywhere inside a string. Adding word boundaries restricts the context where a match can occur, here, only in between letters, digits and `_`. This is a common and widely covered issue on SO, and there is a canonical answer. – Wiktor Stribiżew Sep 10 '21 at 07:16

1 Answers1

0

You could try adding a lookbehind:

>>> a1.str.replace('(?<=\s|,)(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
0    12:04:25 Roberts: 111, 111, 111 this 111
dtype: object
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114