1

I'm writing a program that compresses text by replicating it with a sequence of numbers - but I don't know how to get the program to recognise punctuation as a separate item in the list.

eg, in this sentence with a comma, the comma means that the words 'comma,' and 'comma' are different when using split(). I want to have 'comma' ',' 'comma' instead.

I don't want to get rid of the punctuation - i want it as a separate item in a list

K. West
  • 27
  • 2

1 Answers1

3

You can use re.split like this:

>>> re.split('([{}])'.format(re.escape(string.punctuation)), "comma,comma")
['comma', ',', 'comma']
Francisco
  • 10,918
  • 6
  • 34
  • 45
  • Its kind of fortuitous that the backslash character in `string.punctuation` immediately precedes `']'`, that `'^'` isn't at the beginning of the character set, that `'[,-.]'` defines a character range which includes a literal hyphen, and etc. So this ends up handling everything properly except for backslashes--although I doubt it matters in practice (who uses backslashes in normal text?). If splitting on backslashes do matter, `re.escape(string.punctuation)` will fix this. –  Nov 10 '16 at 21:20