Split a string with unknown number of spaces as separator in Python

Question

I need a function similar to str.split(' ') but there might be more than one space, and different number of them between the meaningful characters. Something like this:

s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '
ss = s.magic_split()
print(ss)  # ['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

Can I somehow use regular expressions to catch those spaces in between?

score 156 · Accepted Answer · edited Apr 07 '22 at 09:00

156

If you don't pass any arguments to str.split(), it will treat runs of whitespace as a single separator:

>>> ' 1234    Q-24 2010-11-29         563   abc  a6G47er15'.split()
['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

edited Apr 07 '22 at 09:00

Boris Verkhovskiy

14,854
11
100
103

answered Nov 30 '10 at 01:12

aaronasterling

68,820
20
127
125

2

Note that without arguments, split() splits on "any whitespace", so tabs (for example) will also be treated as separators (and absorbed into tab-space sequences as a single separator). – Karl Knechtel Nov 30 '10 at 05:55
6

If that's actually a problem (It almost never is) then `[subs for subs in s.split(' ') if s]` – aaronasterling Nov 30 '10 at 19:37

score 23 · Answer 2 · edited Sep 02 '20 at 16:27

23

s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '
ss = s.split()
print(ss)  # ['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

edited Sep 02 '20 at 16:27

Boris Verkhovskiy

14,854
11
100
103

answered Nov 30 '10 at 01:13

Bill Lynch

80,138
16
128
173

Danny Sanchez · Answer 3 · 2018-03-22T13:42:57.743

If you have single spaces amid your data (like an address in one field), here's a solution for when the delimiter has two or more spaces:

with open("textfile.txt") as f:
    content = f.readlines()

    for line in content:
        # Get all variable-length spaces down to two. Then use two spaces as the delimiter.
        while line.replace("   ", "  ") != line:
            line = line.replace("   ", "  ")

        # The strip is optional here.
        data = line.strip().split("  ")
        print(data)

score 3 · Answer 4 · edited Apr 04 '22 at 22:52

3

To split lines by multiple spaces while keeping single spaces in strings:

with open("textfile.txt") as f:
    for line in f:
        line = [i.strip() for i in line.split('  ') if i]
        print(line)

edited Apr 04 '22 at 22:52

Boris Verkhovskiy

14,854
11
100
103

answered Jul 16 '19 at 16:00

Guy de Carufel

466
4
8

Note that this answer is significantly simpler than [this other one](https://stackoverflow.com/a/49430099/95852). But of course neither one directly answers the exact question. Rather, they answer [this related question](https://stackoverflow.com/q/12866631/95852), which not surprisingly has an [equivalent, except better explained answer](https://stackoverflow.com/a/12866686/95852). – John Y Apr 04 '22 at 22:12

score 3 · Answer 5 · edited Oct 19 '21 at 12:50

3

We can also use regex's split method here too.

import re

sample = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '

word_list = re.split("\s+", sample.strip())

print(word_list) #['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

I hope this might help someone

edited Oct 19 '21 at 12:50

TonyMoutaux

355
6
13

answered Sep 06 '21 at 19:04

hitesh bedre

459
2
11

Muthukumar · Answer 6 · 2020-05-11T17:24:47.907

There are many solutions to this question.

1.) Using split() is the simplest method

s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15              '
s = s.split()
print(s)


Output >> ['1234','Q-24','2010-11-29','563','abc','a6G47er15']

2.) There is another way to solve this using findall() method, you need to "import re" in the starting of your python file.

import re
def MagicString(str):
    return re.findall(r'\S+', str)
s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15'
s = MagicString(s)
print(s)
print(MagicString('    he  ll   o'))


Output >> ['1234','Q-24','2010-11-29','563','abc','a6G47er15']
Output >> ['he','ll','o']

3.) If you want to remove any leading (spaces at the beginning) and trailing (spaces at the end) alone use strip().

s = '   hello          '
output = s.strip()
print(output)


Output >> hello

Split a string with unknown number of spaces as separator in Python

6 Answers6

Linked

Related