Splitting a string using the empty string as the delimiter yields leading empty string but no trailing empty string

Question

Suppose you have this expression in Java:

"adam".split("")

This is telling Java to split "adam" using the empty string ("") as the delimiter. This yields:

["", "a", "d", "a", "m"]

Why does Java include an empty string at the start, but not at the end? Using this logic, shouldn't the result have been:

["", "a", "d", "a", "m", ""]

@marcog: Haha, I was afraid of making the title **that** descriptive. ;) But hey, if it works. — Adam Paynter, Dec 28 '10 at 21:11
I tend to err on the side of being more descriptive in the title. :) — moinudin, Dec 28 '10 at 21:13
@marcog: I would be curious to know if that title holds the record for most occurrences of a word ("string"). :) — Adam Paynter, Dec 29 '10 at 16:38
Okay, perhaps a bit excessive. But then the +1 on my comment means at least *someone* likes it descriptive. :) — moinudin, Dec 29 '10 at 16:42

moinudin · Accepted Answer · 2010-12-28T20:45:55.053

10

The delimiter is a regular expression. The regular expression "" matches at the very beginning of the string (before the a in adam). The docs state:

Splits this string around matches of the given regular expression.

Therefore the method will split around the match before the a. The docs also say:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

and

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded."

Therefore, although there will also be a match at the end of the string, the trailing empty string that would result is discarded. Hence the leading empty string, but no trailing empty string. If you want the trailing empty string, just pass a negative value as a second argument:

"adam".split("", -1);

This works, because of this quote from the docs:

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

To answer the question of "why aren't there empty strings in the middle?", a regular expression will only return a single match per location in the string. Therefore there cannot be two matches between two consecutive characters in the string, so going back to my first quote from the docs these additional empty strings won't be present.

edited Dec 28 '10 at 20:45

answered Dec 28 '10 at 20:40

moinudin

134,091
45
190
216

1

What is interesting behind it is motivation. Especially since `split("", 10)` will still return empty strings at the end. – Nikita Rybak Dec 28 '10 at 20:43
@Nikita My guess is it was accidental at first, but then they didn't want to break backwards compatability so introduced the "If n is non-positive" part. – moinudin Dec 28 '10 at 20:49
2

No, it wasn't an accident; that behavior was deliberately copied from Perl's `split`. However, Perl would **not** return the empty token at the beginning like Java does. No matter what pattern is used, or what chunk limit is specified, a zero-length match at the beginning of the target string never results in an empty leading token in Perl's `split`. – Alan Moore Dec 28 '10 at 23:43
@AlanMoore Are you saying that Java wanted to copy Pearl's behavior, but failed at doing so? – Didier A. Feb 19 '15 at 19:51
@didibus: That's right. The basic functionality was meant to be the same, but they got some of the features wrong (like leading empty tokens) and left others out entirely, like using capturing groups to treat the delimiters (or parts of them) as additional tokens. Many of Perl split's higher-level features would be impossible to reproduce in Java, but I see no reason why they couldn't implement captured tokens (as I call them). That's the most annoying of the missing features, in my opinion. – Alan Moore Feb 19 '15 at 22:32

score 6 · Answer 2 · answered Dec 28 '10 at 20:30

6

Looking at the API for the split method is this text: "Trailing empty strings are therefore not included in the resulting array."

answered Dec 28 '10 at 20:30

jzd

23,473
9
54
76

The word "therefore" suggests there is some more context that ought to be quoted. – BoltClock Dec 28 '10 at 20:33
Great answer. How embarrassing, I was caught not reading the documentation! – Adam Paynter Dec 28 '10 at 20:33
"This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array." and "If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded." – moinudin Dec 28 '10 at 20:34
@Adam Paynter don't worry about it. I was not sure when I first read your question. I was surprised to see it spelled out in the javadoc. Even though I have used this method many times I never noticed it. – jzd Dec 28 '10 at 20:35
Do we know *why* the *trailing* empty strings are discarded but not the *leading* empty string? – Adam Paynter Dec 28 '10 at 20:40
To be compatible with Perl's `split`, the leading empty token **should** be suppressed in this case, as I explained in my comment to @marcog's answer. I'd call that a bug in Java's `split` (the specification, not the implementation), but we're stuck with it now. – Alan Moore Dec 28 '10 at 23:59

score 2 · Answer 3 · answered Dec 28 '10 at 20:34

2

Yes, but there are empty Strings between "a" and "d", "d" and "a", "a" and "m". And they also do not appear in the returned array.

split() method removes other occurences of that empty String.

answered Dec 28 '10 at 20:34

Lukasz

7,572
4
41
50

1

Fair enough. But why would it choose to keep the first empty string if it discards all the others? Just seems like an odd decision. – Adam Paynter Dec 28 '10 at 20:38
But are there empty Strings between the empty Strings? – jzd Dec 28 '10 at 20:40
1

No, it does not remove other occurrences of empty strings, only trailing ones. Read my answer for a detailed explanation. – moinudin Dec 28 '10 at 20:41

Splitting a string using the empty string as the delimiter yields leading empty string but no trailing empty string

3 Answers3

Linked