3

The following code:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('\n')
print(chunks)

Correctly prints out:

['Welcome', 'to', 'PythonExamples', 'Welcome', 'to', 'PythonExamples']

I want to split the string into strings that start with 'Welcome\n' so I have tried the following:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('Welcome\n')
print(chunks)

But this prints out:

['', 'to\nPythonExamples\n', 'to\nPythonExamples']

Notice how the first entry is empty. How can I split it up correctly so that the output is?

['to\nPythonExamples\n', 'to\nPythonExamples']
barshopen
  • 1,190
  • 2
  • 15
  • 28
Harry Boy
  • 4,159
  • 17
  • 71
  • 122

3 Answers3

4

If I understand correctly you want to avoid empty strings. You can just use list comprehension, do this:

chunks = [x for x in str.split('Welcome\n') if x]

Should solve your problem. Why?

First of all, the list comprehension adds if x in the end, this means that it will include in the list only truthy values (or rather, will omit falsy values).

But why did you get '' in the first place? It would be the easier to point you at the source code for split:

while (maxcount-- > 0) {
    pos = FASTSEARCH(str+i, str_len-i, sep, sep_len, -1, FAST_SEARCH);
    if (pos < 0)
        break;
    j = i + pos;
    SPLIT_ADD(str, i, j);
    i = j + sep_len;
}

Basically, split function looks for the next occurrence of sep in split(sep) and derives a substring from last occurrence to pos(it would do it maxcount times). Since you got Welcome\n in pos 0 and your "last occurence" is 0, it will make a substring from 0 to 0 which results in an empty string.

By the way, you would also get empty string for such string:

'Welcome\nWelcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'

results for your code, without my change:

['', '', 'to\nPythonExamples\n', 'to\nPythonExamples']

barshopen
  • 1,190
  • 2
  • 15
  • 28
2

You could filter out the empty entries. Also avoid using str as it is a builtin function. Since '' is falsy you don't even need a comparison.

inp = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = list(filter(None, inp.split('Welcome\n')))
print(chunks)
Ryan Schaefer
  • 3,047
  • 1
  • 26
  • 46
  • 1
    Using `lambda` in this case is really unnecessary. You can use `filter(None, chunks)` directly. – alec_djinn Feb 05 '21 at 18:20
  • Hmm, it might be the case that if it is `None` it implicitly does `lambda x: x` as the function though. – Ryan Schaefer Feb 05 '21 at 18:21
  • Interesting it would be better to use none because it is using this in the source code https://github.com/python/cpython/blob/5f18c223391eef8c7d01241b51a7b2429609dd84/Python/bltinmodule.c#L567 so updating it for that improvement – Ryan Schaefer Feb 05 '21 at 18:41
2

One very clean and Pythonic way would be using filter() with None. On the side, str is keyword in Python, you should not use it as a variable name.

text = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = text.split('Welcome\n')
chunks = filter(None, chunks)
print(list(chunks))
#['to\nPythonExamples\n', 'to\nPythonExamples']
alec_djinn
  • 10,104
  • 8
  • 46
  • 71