638

I'm looking for the Python equivalent of

String str = "many   fancy word \nhello    \thi";
String whiteSpaceRegex = "\\s";
String[] words = str.split(whiteSpaceRegex);

["many", "fancy", "word", "hello", "hi"]
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
siamii
  • 23,374
  • 28
  • 93
  • 143

4 Answers4

1166

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 86
    Also good to know is that if you want the first word only (which means passing `1` as second argument), you can use `None` as the first argument: `s.split(None, 1)` – yak Nov 13 '11 at 19:00
  • 10
    If you only want the first word, use *str.partition*. – Raymond Hettinger Nov 13 '11 at 19:11
  • 36
    @yak : Can you please edit your comment. The way it sounds right now is that s.split(None, 1) would return 1st word only. It rather gives a list of size 2. First item being the first word, second - rest of the string. `s.split(None, 1)[0]` would return the first word only – user3527975 Feb 25 '16 at 21:43
  • Also the default split trims whitespace from either side so you don't have to call str.strip() e.g. `" asdf asdf \t\n ".split()` returns `['asdf', 'asdf']` – lee penkman Nov 24 '16 at 01:32
  • does `str.split()` do something like `re.split('\s+', string)` behind the scenes? – galois Dec 20 '16 at 23:05
  • 2
    @galois No, it uses a custom implementation (which is faster). Also note that it handles leading and trailing whitespace differently. – Sven Marnach Dec 21 '16 at 07:53
  • Sven, in my case line, could contain words like `'Kishor Pawar' 'Sven Marnach'`. What would you suggest? – Kishor Pawar Jan 02 '19 at 07:46
  • 3
    @KishorPawar It's rather unclear to me what you are trying to achieve. Do you want to split on whitespace, but disregard whitespace inside single-quoted substrings? If so, you can look into [`shlex.split()`](https://docs.python.org/3/library/shlex.html#shlex.split), which may be what you are looking for. Otherwise I suggest asking a new question – you will get a much quicker and more detailed answer. – Sven Marnach Jan 02 '19 at 10:12
  • Thank you @SvenMarnach. You guessed the case correctly. I will take a look at shelx.split() – Kishor Pawar Jan 02 '19 at 10:22
89
import re
s = "many   fancy word \nhello    \thi"
re.split('\s+', s)
Óscar López
  • 232,561
  • 37
  • 312
  • 386
28

Using split() will be the most Pythonic way of splitting on a string.

It's also useful to remember that if you use split() on a string that does not have a whitespace then that string will be returned to you in a list.

Example:

>>> "ark".split()
['ark']
digitalnomd
  • 1,380
  • 12
  • 21
22

Another method through re module. It does the reverse operation of matching all the words instead of spitting the whole sentence by space.

>>> import re
>>> s = "many   fancy word \nhello    \thi"
>>> re.findall(r'\S+', s)
['many', 'fancy', 'word', 'hello', 'hi']

Above regex would match one or more non-space characters.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274