0

The built-in <string>.split() procedure works only uses whitespace to split the string.

I'd like to define a procedure, split_string, that takes two inputs: the string to split and a string containing all of the characters considered separators.

The procedure should return a list of strings that break the source string up by the characters in the list.

def split_string(source,list):
    ...

>>> print split_string("This is a test-of the,string separation-code!",",!-")
['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
Takatjuta
  • 77
  • 1
  • 9
  • 3
    "The built-in .split() procedure works only uses whitespace to split the string." That's factually wrong. If you don't provide it with an argument then it will use whitespace. But if you do then it will use that argument as the delimiter. – DeepSpace Dec 27 '16 at 15:35
  • Also, what would be the output of `split_string('abcd', 'bc')`? – DeepSpace Dec 27 '16 at 15:39

2 Answers2

1

re.split() works:

>>> import re
>>> s = "This is a test-of the,string separation-code!"
>>> re.split(r'[ \-\,!]+', s)

['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code', '']

In your case searching for words seems more useful:

>>> re.findall(r'[\w']+', s)
['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

Here's a function you can reuse - that also escapes special characters:

def escape_char(char):
    special = ['.', '^', '$', '*', '+', '?', '\\', '[', ']', '|']
    return '\\{}'.format(char) if char in special else char

def split(text, *delimiters):
    return re.split('|'.join([escape_char(x) for x in delimiters]), text)

It doesn't automatically remove empty entries, e.g.:

>>> split('Python, is awesome!', '!', ',', ' ')
['Python', '', 'is', 'awesome', '']
fips
  • 4,319
  • 5
  • 26
  • 42