split python string without empty strings

Question

The following code:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('\n')
print(chunks)

Correctly prints out:

['Welcome', 'to', 'PythonExamples', 'Welcome', 'to', 'PythonExamples']

I want to split the string into strings that start with 'Welcome\n' so I have tried the following:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('Welcome\n')
print(chunks)

But this prints out:

['', 'to\nPythonExamples\n', 'to\nPythonExamples']

Notice how the first entry is empty. How can I split it up correctly so that the output is?

['to\nPythonExamples\n', 'to\nPythonExamples']

Just to clarify, the only thing that bothers you are the empty strings that might come up? — barshopen, Feb 05 '21 at 18:07
I assumed that because the empty string is there then I am parsing it incorrectly. I wonder why the empty value is there — Harry Boy, Feb 05 '21 at 18:10

barshopen · Accepted Answer · 2021-02-06T02:11:55.227

If I understand correctly you want to avoid empty strings. You can just use list comprehension, do this:

chunks = [x for x in str.split('Welcome\n') if x]

Should solve your problem. Why?

First of all, the list comprehension adds if x in the end, this means that it will include in the list only truthy values (or rather, will omit falsy values).

But why did you get '' in the first place? It would be the easier to point you at the source code for split:

while (maxcount-- > 0) {
    pos = FASTSEARCH(str+i, str_len-i, sep, sep_len, -1, FAST_SEARCH);
    if (pos < 0)
        break;
    j = i + pos;
    SPLIT_ADD(str, i, j);
    i = j + sep_len;
}

Basically, split function looks for the next occurrence of sep in split(sep) and derives a substring from last occurrence to pos(it would do it maxcount times). Since you got Welcome\n in pos 0 and your "last occurence" is 0, it will make a substring from 0 to 0 which results in an empty string.

By the way, you would also get empty string for such string:

'Welcome\nWelcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'

results for your code, without my change:

['', '', 'to\nPythonExamples\n', 'to\nPythonExamples']

@HarryBoy, I've added this part to my answer. Let me know if you have any more questions :) GL — barshopen, Feb 05 '21 at 18:53

Ryan Schaefer · Answer 2 · 2021-02-05T18:41:26.887

2

You could filter out the empty entries. Also avoid using str as it is a builtin function. Since '' is falsy you don't even need a comparison.

inp = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = list(filter(None, inp.split('Welcome\n')))
print(chunks)

edited Feb 05 '21 at 18:41

answered Feb 05 '21 at 18:08

Ryan Schaefer

3,047
1
26
46

1

Using `lambda` in this case is really unnecessary. You can use `filter(None, chunks)` directly. – alec_djinn Feb 05 '21 at 18:20
Hmm, it might be the case that if it is `None` it implicitly does `lambda x: x` as the function though. – Ryan Schaefer Feb 05 '21 at 18:21
Interesting it would be better to use none because it is using this in the source code https://github.com/python/cpython/blob/5f18c223391eef8c7d01241b51a7b2429609dd84/Python/bltinmodule.c#L567 so updating it for that improvement – Ryan Schaefer Feb 05 '21 at 18:41

alec_djinn · Answer 3 · 2021-02-05T19:30:39.957

2

One very clean and Pythonic way would be using filter() with None. On the side, str is keyword in Python, you should not use it as a variable name.

text = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = text.split('Welcome\n')
chunks = filter(None, chunks)
print(list(chunks))
#['to\nPythonExamples\n', 'to\nPythonExamples']

edited Feb 05 '21 at 19:30

answered Feb 05 '21 at 18:16

alec_djinn

10,104
8
46
71

1

`chunks = list(filter(None, chunks))` otherwise `chunks` is printed as a filter object. – aneroid Feb 05 '21 at 18:29

split python string without empty strings

3 Answers3