-1

I'm currently learning a bit of regex in python in a course I'm doing online and I'm struggling to understand a particular expression - I've been searching the python re docs and not sure why I'm returning the non-punctuation elements rather than the punctuation.

The code is:

import re
test_phrase = "This is a sentence, with! unnecessary: punctuation."
punc_remove = re.findall(r'[^,!:]+',test_phrase)
punc_reomve

OUTPUT: ['This is a sentence',' with',' unnecessary',' punctuation.']

I think I understand what each character does. I.e. [] is a character set, and ^ means starts with. So anything starting with ,!: will be returned? (or at least that's how I'm probably mistakingly interpreting it) And the + will return one of more of the pattern. But why is the output not returning something like:

OUTPUT: [', with','! unnecessary',': punctuation.']

Any explanation really appreciated!

  • In this case, `^` is not the start of the string. It will negate `[,!:]`, so `[^,!:]` means any character other than `,`, `!` or `:`. See https://docs.python.org/3/library/re.html#index-9 – Cubix48 Mar 19 '22 at 22:17
  • https://regex101.com/ Paste the regular expression there. If you highlight over the regular expression it will tell you what each element is doing. It also has a breakdown in the upper right corner of the pattern. While not necessary for your example pattern, you can tune which regex engine is used so that if you're programming with python, you can test with a python engine. –  Mar 19 '22 at 22:47

1 Answers1

0

Inside a character class, a ^ does not mean ‘start with’: it means ‘not’. So the RegEx matches sequences of one or more non-,1: characters.

LeopardShark
  • 3,820
  • 2
  • 19
  • 33