python: split string after comma and dots

Question

I have a piece of code which splits a string after commas and dots (but not when a digit is before or after a comma or dot):

text = "This is, a sample text. Some more text. $1,200 test."
print re.split('(?<!\d)[,.]|[,.](?!\d)', text)

The result is:

['This is', ' a sample text', ' Some more text', ' $1,200 test', '']

I don't want to lose the commas and dots. So what I am looking for is:

['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']

Besides, if a dot in the end of text it produces an empty string in the end of the list. Furthermore, there are white-spaces at the beginning of the split strings. Is there a better method without using re? How would you do this?

Like I showed in the example the commas and dots are "lost", but I want to preserve them. — Johnny, Jan 02 '14 at 23:32
Possible duplicate of http://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators — Steinar Lima, Jan 02 '14 at 23:33
Not quite a duplicate of that question, in this one the separators are not separate entries in the result. — Andrew Clark, Jan 02 '14 at 23:39

score 9 · Accepted Answer · answered Jan 02 '14 at 23:34

Unfortunately you can't use re.split() on a zero-length match, so unless you can guarantee that there will be whitespace after the comma or dot you will need to use a different approach.

Here is one option that uses re.findall():

>>> text = "This is, a sample text. Some more text. $1,200 test."
>>> print re.findall(r'(?:\d[,.]|[^,.])*(?:[,.]|$)', text)
['This is,', ' a sample text.', ' Some more text.', ' $1,200 test.', '']

This doesn't strip whitespace and you will get an empty match at the end if the string ends with a comma or dot, but those are pretty easy fixes.

If it is a safe assumption that there will be whitespace after every comma and dot you want to split on, then we can just split the string on that whitespace which makes it a little simpler:

>>> print re.split(r'(?<=[,.])(?<!\d.)\s', text)
['This is,', 'a sample text.', 'Some more text.', '$1,200 test.']

python: split string after comma and dots

1 Answers1