5

So I have this logic which splits a String by 4 characters each. Something like this

0108011305080508000000 gives 0108 0113 0508 0508 0000 00

The logic I used is

String [] splitErrorCode = inputString.split("(?<=\\G....)");

It works great in Java, but when I run it in Android I get the wrong output.

0108 011305080508000000

I have no clue what is going wrong. After a going through the String's split function, I realized Android uses fastSplit where as the Java version has a huge splitting logic.

Aren't both the functions supposed to work identically? Why is this a problem? Any comments/ suggestions?

iZBasit
  • 1,314
  • 1
  • 15
  • 30
  • 1
    Great question (+1). I'm amazed that a method as basic as `split` can behave differently in android and Java. Does anyone know where there is a list of differences? I would ask it as a question, but I know it would be closed in seconds. – Paul Boddington Dec 29 '14 at 15:55

2 Answers2

4

\G in Java was added in Java 6 to mimic the Perl construct:

http://perldoc.perl.org/perlfaq6.html#What-good-is-%5CG-in-a-regular-expression:

You use the \G anchor to start the next match on the same string where the last match left off.

Support for this was very poor. This construct is documented in Python to be used in negative variable-length lookbehinds to limit how far back the lookbehind goes. Explicit support was added.

However, the split method in JDK 7 has a fast path for the common case where the limit is a single character. This avoids the need to compile or to use regex. Here is the method (detail source redacted):

public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
         (regex.length() == 2 &&
         /* Fast path checks as documented in the comment */ )
    {
        // Fast path computation redacted!
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

And before:

public String[] split(String regex, int limit) {
    return Pattern.compile(regex).split(this, limit);
}

While this fastpath exists, note that deploying Android programs means it must have source compatibility with Java 6. The Android environment is unable to take advantage of the fast path, therefore it delegates to fastSplit and loses some of the Perl construct supports, such as \A.

As for why they didn't like the traditional always-regex path, it's kind of obvious by itself.

Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72
3

Instead of splitting like you do (and by the way this will recompile a pattern for each split operation) just do it like this; it's more simple and performs better:

private static final Pattern ONE_TO_FOUR_DIGITS = Pattern.compile("\\d{1,4}");

// ...

public List<String> splitErrorCodes(final String input)
{
    final List<String> ret = new ArrayList<>(input.length() / 4 + 1);
    final Matcher m = ONE_TO_FOUR_DIGITS.matcher(input);

    while (m.find())
        ret.add(m.group());

    return ret;
}

Of course, an additional check would need to be performed on the shape of input as a whole but it's really not hard to do. Left as an exercise ;)

fge
  • 119,121
  • 33
  • 254
  • 329
  • Thanks a ton @fge. Your solution is working great, Would eventually mark your answer as accepted. Waiting for people to comment on why such a simple split method has different implementations,and doesn't work on android. – iZBasit Dec 29 '14 at 17:05