0

I would like to have alternative separator when calling string.split()

>>> import string
>>> string.split('a n', ' ')
['a', 'n']

which is correct.

>>> string.split('a n"c', ' "')
['a n"c']
>>> string.split('a n"c', '[ |"]')
['a n"c']

The ideal split should be ['a', 'n', 'c'].

>>> string.split('a n" "c', '[ |"]')
['a n" "c']
>>> string.split('a n" "c', ' "')
['a n"', 'c']

The ideal split should be ['a', 'n', 'c'].

So I wonder how can I do that?

Tim
  • 1
  • 141
  • 372
  • 590
  • 1
    `string.split()` is deprecated; you can call the method directly on strings. – Martijn Pieters Sep 12 '14 at 21:59
  • why is it deprecated? Isn't that similar to C++? (which is good imho) – Tim Sep 12 '14 at 21:59
  • Because `'a b c'.split()` works too; those are methods on the `str` object. – Martijn Pieters Sep 12 '14 at 22:01
  • It's not just deprecated, it's actually been removed in Python 3.x. – Roger Fan Sep 12 '14 at 22:01
  • Why removed? @RogerFan – Tim Sep 12 '14 at 22:02
  • `[x for x in 'a n" "c' if x.isalpha()]` would work too – Padraic Cunningham Sep 12 '14 at 22:04
  • @Tim Because it fits better as a method. You can call it on every string and you would never call it on something that isn't a string. And it's removed because, ideally, there should only be one right way to do things, and having `split` as a method makes more sense and is more convenient than having to import a module. – Roger Fan Sep 12 '14 at 22:04
  • @RogerFan: does that mean a script in Python 2.7 can't be run by Python 3.x? – Tim Sep 12 '14 at 22:07
  • @Tim In general, Python 2.x code is not always valid Python 3.x code. That's the whole point of Python 3, making all the backward-incompatible changes that they felt needed to be made. To see some of the most commonly used changes, see [this post](https://docs.python.org/3.0/whatsnew/3.0.html). – Roger Fan Sep 12 '14 at 22:10

1 Answers1

4

str.split() is not that sophisticated; what you want is re.split() instead:

re.split(r'[ "]+', some_string)

Demo:

>>> import re
>>> re.split(r'[ "]+', 'a n" "c')
['a', 'n', 'c']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • thanks, Martijn. When there is preceding or trailing whitespace, there will be empty strings after splitting. For example, `re.split(r'[ "]+', ' a n" "c ')` will return `['', 'a', 'n', 'c', '']`. How can I avoid those empty strings? – Tim Sep 13 '14 at 00:09
  • @Tim: `re.findall(r'[^ "]+', ' a n" "c ')` . – PM 2Ring Sep 13 '14 at 04:10