I have a sentence:
'hi how <unk> are you'
I need to remove <unk>
from it.
Here is my code:
re.sub(r'\b{}\b'.format('<unk>'), '', 'agent transcript str <unk> with chunks for key phrases')
Why doesn't my RegEx work for <...>?
I have a sentence:
'hi how <unk> are you'
I need to remove <unk>
from it.
Here is my code:
re.sub(r'\b{}\b'.format('<unk>'), '', 'agent transcript str <unk> with chunks for key phrases')
Why doesn't my RegEx work for <...>?
There is no word boundary between a space an <
or >
, you could instead try
re.sub(r'(\s*)<unk>(\s*)', r'\1\2', your_string)
Or - if you don't want two spaces, you may try
re.sub(r'(\s*)<unk>\s+', r'\1', your_string)
\b
is a word boundary between a non-word character ([^\w+]+
) and a word character (\w+
or [A-Za-z0-9_]
). In your original string, you were trying to find a boundary between a space and a <
or >
where \b
is not matching.