0

When I'm splitting a string "abac" I'm getting undesired results.

Example

print("abac".split("a"))

Why does it print:

['', 'b', 'c']

instead of

['b', 'c']

Can anyone explain this behavior and guide me on how to get my desired output?

Thanks in advance.

Jab
  • 26,853
  • 21
  • 75
  • 114
arti8719
  • 31
  • 2
  • 2
    `a` is the separator, `split()` will return all the words in between the separators without them. – Vasilis G. Nov 20 '18 at 21:01
  • 2
    Because this is how `split` works and is documented: "If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, `'1,,2'.split(',')` returns `['1', '', '2'])`" https://docs.python.org/3.7/library/stdtypes.html#str.split – DeepSpace Nov 20 '18 at 21:03
  • `split` and `join` are implemented so that `x.join(s.split(x)) == s` for any string `s` and non-empty string `x`. – chepner Nov 20 '18 at 21:06

4 Answers4

1

When you split a string in python you keep everything between your delimiters (even when it's an empty string!)

For example, if you had a list of letters separated by commas:

>>> "a,b,c,d".split(',')
['a','b','c','d']

If your list had some missing values you might leave the space in between the commas blank:

>>> "a,b,,d".split(',')
['a','b','','d']

The start and end of the string act as delimiters themselves, so if you have a leading or trailing delimiter you will also get this "empty string" sliced out of your main string:

>>> "a,b,c,d,,".split(',')
['a','b','c','d','','']

>>> ",a,b,c,d".split(',')
['','a','b','c','d']

If you want to get rid of any empty strings in your output, you can use the filter function.

If instead you just want to get rid of this behavior near the edges of your main string, you can strip the delimiters off first:

>>> ",,a,b,c,d".strip(',')
"a,b,c,d"

>>> ",,a,b,c,d".strip(',').split(',')
['a','b','c','d']
jfbeltran
  • 1,808
  • 3
  • 13
  • 17
1

As @DeepSpace pointed out (referring to the docs)

If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).

Therefore I'd suggest using a better delimiter such as a comma , or if this is the formatting you're stuck with then you could just use the builtin filter() function as suggested in this answer, this will remove any "empty" strings if passed None as the function.

sample = 'abac'
filtered_sample = filter(None, sample.split('a'))
print(filtered_sample)
#['b', 'c']
Jab
  • 26,853
  • 21
  • 75
  • 114
0

In your example, "a" is what's called a delimiter. It acts as a boundary between the characters before it and after it. So, when you call split, it gets the characters before "a" and after "a" and inserts it into the list. Since there's nothing in front of the first "a" in the string "abac", it returns an empty string and inserts it into the list.

Yang K
  • 407
  • 4
  • 13
0

split will return the characters between the delimiters you specify (or between an end of the string and a delimiter), even if there aren't any, in which case it will return an empty string. (See the documentation for more information.)

In this case, if you don't want any empty strings in the output, you can use filter to remove them:

list(filter(lambda s: len(s) > 0, "abac".split("a"))
Harry Cutts
  • 1,352
  • 11
  • 25