I am unable to filter out a specific word in a line using python re
module.
Suppose I want to match every word except "cat" in a line, the following code does not work:
re.search("(?!cat)", "a black cat is scary")
Please help.
I am unable to filter out a specific word in a line using python re
module.
Suppose I want to match every word except "cat" in a line, the following code does not work:
re.search("(?!cat)", "a black cat is scary")
Please help.
You need to set what to actually search for. Remember, computers will do what we tell them to do and nothing else.
If you're looking to buy all socks in a store except black coloured ones, you go up to them and say "I want all your socks except black coloured ones."
What you did was essentially say "I don't want black coloured socks"
re.search("(?!cat\b)\b\w+", "a black cat is scary")
the problem is in the regular expression basically you're telling it to find the place where cat
can't be found ie |a| |b|l|a|c|k| c|a|t| |i|s| |s|c|a|r|y|
(pipes to show where the regex engine will stop) you need to change the regular expression to \b(?!cat\b)\w+
where:
\b
assert position at a word boundary.\w
matches any word character (equal to [a-zA-Z0-9_]
)(?!cat\b)
Negative Lookahead match when next characters are not cat{endofword}
this regular expression will match cat
but not catastrophe
. the result for running the regex on a black cat is a catastrophe
|a |black cat |is |a |catastrophe
EDIT :
the call failed because python
's default behaviour is to treat \b
as a backspace like the other special characters like \n \t \r
.
the call needs to be re.search(r"\b(?!cat\b)\w+", "a black cat is a catastrophe")
. And if you want to get all the matches as a list use the re.findall
function
you can find the results in here
You need to use the re.sub
method instead
re.sub(r"cat ", "", "a black cat is scary") # a black is scary