4

How would I split a string into equal parts using String.split() provided the string is full of numbers? For the sake of example, each part in the below is of size 5.

"123" would be split into "123"
"12345" would be split into "12345"
"123451" would be split into "12345" and "1"
"123451234512345" would be split into "12345", "12345" and "12345"
etc

These are to be put in an array:

String myString = "12345678";
String[] myStringArray = myString.split(???);
//myStringArray => "12345", "678";

I'm just unsure the regex to use, nor how to separate it into equal sized chunks.

gator
  • 3,465
  • 8
  • 36
  • 76

1 Answers1

7

You can try this way

String input = "123451234512345";
String[] pairs = input.split("(?<=\\G\\d{5})");
System.out.println(Arrays.toString(pairs));

Output:

[12345, 12345, 12345]

This regex uses positive look behind mechanism (?<=...) and \\G which represents "previous match - place where previously matched string ends, or if it doesn't exist yet (when we just started matching) ^ which is start of the string".

So regex will match any place that has five digits before it and before this five digits previously matched place we split on.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • +1 An explanation might help better :) – Rohit Jain Mar 09 '14 at 15:13
  • @RohitJain was in the middle of writing it :) – Pshemo Mar 09 '14 at 15:16
  • +1 spot on with use of `\\G` – anubhava Mar 09 '14 at 15:17
  • Cheers. Will this also take into consideration where the `input` is less than 5 digits? – gator Mar 09 '14 at 15:18
  • @riista Yes. Just try it? – Rohit Jain Mar 09 '14 at 15:19
  • 2
    Worth mentioning here that lookarounds, as all other anchors, do not consume any text; necessary since `.split()` normally consumes the delimiter – fge Mar 09 '14 at 15:20
  • 1
    @riista Yes. Result for `"123"` is `[123]`. This regex decide to split only after first 5 digits, so if there will be none it will not split. – Pshemo Mar 09 '14 at 15:20
  • 2
    Worth also mentioning what `\G` really does, that is "start where previous match left off"; since the regex doesn't consume any text, if there were no `\G` here, it would start at the next character, since regex engines shift by one character if the match is empty (which it is here). But that may be quite a lot to explain... In any case, +1 – fge Mar 09 '14 at 15:23
  • @fge Thanks, as always your comments are very helpful. +1 – Pshemo Mar 09 '14 at 15:26