0

I was trying to look for instances with C or c not followed by + and also including the one located at the end of the string.

The answer regex from the online class is as following;

pattern = r'\b[Cc]\b(?![\+\.])'

But I don't understand why it cannot be like this;

pattern = r'\b[Cc](?!+.)'

Can anyone explain why?

Much appreciated if you can enlighten me!

1 Answers1

1

Your original regex can be simplified to \b[Cc]\b(?!\+) i.e.:

  • the negative lookahead should contain only (escaped) +,
  • so brackets are not needed.

When you try the above regex on xxx c- c# c! c+ ac ca xxx c:

  • first 3 occurrences of c are matched, as:
    • before them there is a word boundary,
    • after them there is also a word boundary ("-", "#" and "!" are not word chars),
    • after them there are no "+" (forbidden by the negative lookahead),
  • fourth occurrence (c+) is not matched (the lookahead failed),
  • fifth occurrence (ac) is not matched (no word boundary before "c"),
  • sixth occurrence (ca) is not matched (no word boundary after "c"),
  • the seventh occurrence (terminal c) is matched (word boundary before, word boundary after, no - after).

And now let's look at your second regex. It should not contain any dot after +. Another correction is that + must be escaped, otherwise there is regex error. So your second pattern should be corrected to: \b[Cc](?!\+):

  • it also contains word boundary (\b) before,
  • but now word boundary is required after it,
  • it also contains the negative lookahead as before.

This time the sixth occurrence of c in my test string is also matched, because the second pattern doesn't require any word boundary after c.

So to sum up, it is up to you whether you require the word boundary after c. Actually, you wrote instances with "C" or "c" not followed by "+", so there is no requirement for word boundary after "c" and the second pattern (after my corrections) is also OK.

I advise you to use online regex tester at https://regex101.com/, as it contains good explanations concerning the pattern tried.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41