446

I have some python code that splits on comma, but doesn't strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.

ivanleoncz
  • 9,070
  • 7
  • 57
  • 49
Mr_Chimp
  • 6,658
  • 5
  • 37
  • 47

10 Answers10

794

Use list comprehension -- simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.

Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
  • 1
    Super good! I added one item as follows to get rid of the blank list entries. > text = [x.strip() for x in text.split('.') if x != ''] – RandallShanePhD Jul 28 '17 at 19:41
  • @Sean: was invalid/incomplete python code your "original intent of the post"? According to the review wankers it was: https://stackoverflow.com/review/suggested-edits/21504253. Can you please tell them otherwise by making the correction if they are wrong (again)? – Forage Nov 25 '18 at 10:19
  • The original was copy-pasted from a REPL (if I remember correctly) and the goal was understanding of the underlying concept (using list comprehension to perform an operation) - but you're right, it makes more sense if you *see* that list comprehension produces a new list. – Sean Vieira Nov 26 '18 at 00:24
47

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:

$ time ./list_comprehension.py  # [word.strip() for word in string.split(',')]
real    0m22.876s

$ time ./map_with_lambda.py     # map(lambda s: s.strip(), string.split(','))
real    0m25.736s

$ time ./map_with_str.strip.py  # map(str.strip, string.split(','))
real    0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Sean
  • 15,561
  • 4
  • 37
  • 37
28

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^\s+ doesn't match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here's why you need ^\s+:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.

tbc0
  • 1,563
  • 1
  • 17
  • 21
  • 9
    I believe `[x.strip() for x in my_string.split(',')]` is more pythonic for the question asked. Maybe there are cases where my solution is necessary. I'll update this content if I run across one. – tbc0 Jul 27 '14 at 23:32
  • Why is `^\s+` necessary? I've tested your code without it and it doesn't work, but I don't know why. – laike9m Apr 21 '15 at 09:33
  • If I use `re.compile("^\s*,\s*$")`, result is `[' blah, lots , of , spaces, here ']`. – laike9m Apr 21 '15 at 15:40
  • @laike9m, I updated my answer to show you the difference. `^\s+` makes. As you can see for yourself, `^\s*,\s*$` doesn't return desired results, either. So if you want split with a regexp, use `^\s+|\s*,\s*|\s+$`. – tbc0 Apr 21 '15 at 17:05
  • The first match is empty if the leading pattern (^\s+) doesn't match so you get something like [ '', 'foo', 'bar' ] for the string "foo, bar". – Steeve McCauley Apr 01 '16 at 12:36
  • `re.split('\s*,\s*', " blah, lots , of , spaces, here ".strip())` returns `['blah', 'lots', 'of', 'spaces', 'here']` and avoids the special cases in the regular expression. – awatts Jul 22 '16 at 08:23
  • I don't think replacing the special cases in the regexp with a call to a string method is a clear win. – tbc0 Jul 23 '16 at 22:01
  • `re.split(r"[^\w']+", input.strip())` for splitting on anything other than word characters and apostrophe. `re.split("[ ,]+", input.strip())` for splitting on just spaces and commas. Both consume multiple 'split' characters so there are no empty strings in the output – Baldrickk May 24 '18 at 07:59
23

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')
user489041
  • 27,916
  • 55
  • 135
  • 204
  • 21
    Kind of a problem if the items separated by commas contain embedded spaces, e.g. `"you just, broke this"`. – Robert Rossney Nov 01 '10 at 19:45
  • 2
    Geeze, a -1 for this. You guys are tough. It solved his problem, providing his sample data was only single words and there was no specification that the data would be phrases. But w/e, I guess thats how you guys roll around here. – user489041 Nov 02 '10 at 15:53
  • Well thanks anyway, user. To be fair though I specifically asked for split and then strip() and strip removes leading and trailing whitespace and doesn't touch anything in between. A slight change and your answer would work perfectly, though: mylist = mystring.strip().split(',') although I don't know if this is particularly efficient. – Mr_Chimp Nov 03 '10 at 09:29
14

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub

Brad Montgomery
  • 2,621
  • 1
  • 24
  • 24
  • 4
    Your example would not work on strings containing spaces. "for, example this, one" would become "for", "examplethis", "one". Not saying it's a BAD solution (it works perfectly on my example) it just depends on the task in hand! – Mr_Chimp Feb 01 '12 at 16:11
  • Yep, that's very correct! You could probably adjust the regexp so it can handle strings with spaces, but if the list comprehension works, I'd say stick with it ;) – Brad Montgomery Feb 03 '12 at 04:36
5

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))

user470379
  • 4,879
  • 16
  • 21
2

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn't work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that's ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!

Dannid
  • 1,507
  • 1
  • 20
  • 17
  • 2
    Why not just `re.split(' *, *', string)`? – Paul Tomblin Nov 16 '15 at 16:12
  • 4
    @PaulTomblin good idea. One can also have done this: `re.split('[, ]*',string)` for the same effect. – Dannid Nov 25 '15 at 19:37
  • Dannid I realized after writing that that it doesn't strip whitespace at the beginning and the end like @tbc0's answer does. – Paul Tomblin Nov 25 '15 at 20:07
  • @PaulTomblinheh, and my rebuttal `[, ]*` leaves an empty string at the end of the list. I think filter is still a nice thing to throw in there, or stick to list comprehension like the top answer does. – Dannid Nov 25 '15 at 21:04
2
import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.

Zieng
  • 453
  • 1
  • 7
  • 17
1
s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st
Pang
  • 9,564
  • 146
  • 81
  • 122
1
import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!

Hrvoje
  • 13,566
  • 7
  • 90
  • 104
ghchoi
  • 4,812
  • 4
  • 30
  • 53