0

I've been trying to use regex in python to match either individual punctuation marks or groups of them. For example, I want to split out punctuation marks like '!?!' and just '@'.

I have the following regex: (["#$%&()*+,-/:;<=>@[\]^_`{|}~]|[.?!]+), which does what I want, mostly, except that it seems to capture periods individually (so I get . . . instead of ...)

What I don't understand is that if I move the , character in the first [] group somewhere else, it works fine... even if its just one character right or left.

Is there some significance there? Why doesn't it work properly when I have it where it is? (taken from string.punctuation)

Thanks in advance. I've searched around and couldn't find anything... so hopefully this isn't too dumb of a question...

user157000
  • 361
  • 1
  • 3
  • 7

1 Answers1

3

In a character class (the square bracket syntax in regexes), a hyphen means a range of characters. You have ,-/ in your square brackets, which means it will match any of , - . /

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 2
    Addendum: to match a literal hyphen in a character class, it needs to be either first, or last: `[a-zA-Z-]` matches any English letter, or hyphen. – Amadan Jan 23 '18 at 00:39
  • Thank you both. It might have been because of the context but I couldn't see that character as a hyphen for some reason, and didn't understand why the minus sign was causing problems. The question was answered too fast for me to accept the answer, but I will do so asap – user157000 Jan 23 '18 at 00:44