Split splits the given String
. If you split
"[882,337]" on "[" or "," or "]" then you actually have:
But, as you have called String.split(delimiter)
, this calls String.split(delimiter, limit)
with a limit
of zero.
From the documentation:
The limit
parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n
is greater than zero then the pattern will be applied at most n - 1
times, the array's length will be no greater than n
, and the array's last entry will contain all input beyond the last matched delimiter. If n
is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n
is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
(emphasis mine)
So in this configuration the final, empty, strings are discarded. You are therefore left with exactly what you have.
Usually, to tokenize something like this, one would go for a combination of replaceAll
and split
:
final String[] tokens = input.replaceAll("^\\[|\\]$").split(",");
This will first strip off the start (^[
) and end (]$
) brackets and then split on ,
. This way you don't have to have somewhat obtuse program logic where you start looping from an arbitrary index.
As an alternative, for more complex tokenizations, one can use Pattern
- might be overkill here, but worth bearing in mind before you get into writing multiple replaceAll
chains.
First we need to define, in Regex, the tokens we want (rather than those we're splitting on) - in this case it's simple, it's just digits so \d
.
So, in order to extract all digit only (no thousands/decimal separators) values from an arbitrary String
on would do the following:
final List<Integer> tokens = new ArrayList<>(); <-- to hold the tokens
final Pattern pattern = Pattern.compile("\\d++"); <-- the compiled regex
final Matcher matcher = pattern.matcher(input); <-- the matcher on input
while(matcher.find()) { <-- for each matched token
tokens.add(Integer.parseInt(matcher.group())); <-- parse and `int` and store
}
N.B: I have used a possessive regex pattern for efficiency
So, you see, the above code is somewhat more complex than the simple replaceAll().split()
, but it is much more extensible. You can use arbitrary complex regex to token almost any input.