3

Possible Duplicate:
Python split() without removing the delimiter

I wish to split a string as follows:

text = " T?e  qu!ck ' brown 1 fox!     jumps-.ver. the 'lazy' doG?  !"
result -> (" T?e  qu!ck ' brown 1 fox!", "jumps-.ver.", "the 'lazy' doG?", "!")

So basically I want to split at ". ", "! " or "? " but I want the spaces at the split points to be removed but not the dot, comma or question-mark.

How can I do this in an efficient way?

The str split function takes only on separator. I wonder is the best solution to split on all spaces and then find those that end with dot, comma or question-mark when constructing the required result.

Community
  • 1
  • 1
Baz
  • 12,713
  • 38
  • 145
  • 268
  • 1
    @Dominic Kexel The two question are certainly related but they're not duplicates. – Baz Jan 31 '13 at 09:58

1 Answers1

15

You can achieve this using a regular expression split:

>>> import re
>>> text = " T?e  qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG?  !"
>>> re.split('(?<=[.!?]) +',text)
[" T?e  qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']

The regular expression '(?<=[.!?]) +' means match a sequence of one or more spaces (' +') only if preceded by a ., ! or ? character ('(?<=[.!?])').

isedev
  • 18,848
  • 3
  • 60
  • 59