11

If we have a val txt: kotlin.String = "1;2;3;" and like to split it into an array of numbers, we can try the following:

val numbers = string.split(";".toRegex())
//gives: [1, 2, 3, ]

The trailing empty String is included in the result of CharSequence.split.

On the other hand, if we look at Java Strings, the result is different:

val numbers2 = (string as java.lang.String).split(";")
//gives: [1, 2, 3]

This time, using java.lang.String.split, the result does not include the trailing empty String. This behaviour actually is intended given the corresponding JavaDoc:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

In Kotlin's version though, 0 also is the default limit argument as documented here, yet internally Kotlin maps that 0 on a negative value -1 when java.util.regex.Pattern::split is called:

nativePattern.split(input, if (limit == 0) -1 else limit).asList()

It seems to be working as intended but I'm wondering why the language seems to be restricting the Java API since a limit of 0 is not provided anymore.

azizbekian
  • 60,783
  • 13
  • 169
  • 249
s1m0nw1
  • 76,759
  • 17
  • 167
  • 196
  • I don't know the reason they chose to make it that way but at least to me it feels more intuitive. If using regex you can use a negative lookahead instead: `;(?!$)` or `;(?!;*$)` – Bubletan Feb 09 '18 at 01:50
  • 1
    I've always considered Java's `limit` semantics a chaos. It's haphazard, self-inconsistent and virtually impossible to memorize. – Marko Topolnik Feb 09 '18 at 09:10

1 Answers1

15

The implementation implies that it's the behavior of java.lang.String.split achieved by passing limit = 0 that is lost in Kotlin. Actually, from my point of view, it was removed to achieve consistency between the possible options in Kotlin.

Consider a string a:b:c:d: and a pattern :.

Take a look at what we can have in Java:

limit < 0[a, b, c, d, ]
limit = 0[a, b, c, d]
limit = 1[a:b:c:d:]
limit = 2[a, b:c:d:]
limit = 3[a, b, c:d:]
limit = 4[a, b, c, d:]
limit = 5[a, b, c, d, ] (goes on the same as with limit < 0)
limit = 6[a, b, c, d, ]
...

It appears that the limit = 0 option is somewhat unique: it has the trailing : neither replaced by an additional entry, as with limit < 0 or limit >= 5, nor retained in the last resulting item (as with limit in 1..4).

It seems to me that the Kotlin API improves the consistency here: there's no special case that, in some sense, loses the information about the last delimiter followed by an empty string – it's left in place either as the delimiter in the last resulting item or as a trailing empty entry.

IMO, the Kotlin function seems to better fit the principle of least astonishment. The zero limit in java.lang.String.split, on contrary, looks more like a special value modifying the method's semantics. And so do the negative values, that evidently don't make intuitive sense as a limit and are not quite clear without digging through the Javadoc.

hotkey
  • 140,743
  • 39
  • 371
  • 326
  • 2
    It makes sense because in Kotlin it is easy to manipulate the result like removing trailing empty strings by `string.split(";".toRegex()).dropLastWhile { it.isEmpty() }` – Naetmul Feb 09 '18 at 06:07
  • Hysterically, this Kotlin's `split` was modeled after `split` in Python, where it works "consistently with itself" – voddan Feb 09 '18 at 12:20
  • 4
    I'm still maximally astonished at how `"ab".split("")` with kotlin.String returns `["","a","b",""]`, java.lang.String returns `["","a","b"]` on JVM 7 and `["a","b"]` on [JVM 8](https://stackoverflow.com/a/27477312/4506528). I will probably never pass an empty string to `split` from now on... – Hay Apr 23 '18 at 23:08
  • oh, come on voddan, its consistent with most of other programming languages, java one is wrong – Luiz Felipe Dec 04 '20 at 23:24