23

I want to split the string "aaaabbbccccaaddddcfggghhhh" into "aaaa", "bbb", "cccc". "aa", "dddd", "c", "f" and so on.

I tried this:

String[] arr = "aaaabbbccccaaddddcfggghhhh".split("(.)(?!\\1)");

But this eats away one character, so with the above regular expression I get "aaa" while I want it to be "aaaa" as the first string.

How do I achieve this?

Óscar López
  • 232,561
  • 37
  • 312
  • 386
Lokesh
  • 7,810
  • 6
  • 48
  • 78
  • 1
    @Adri1du40: I am open to other options but don't want to use loop. – Lokesh May 07 '14 at 16:52
  • Check this question : http://stackoverflow.com/questions/15101577/split-string-when-character-changes-possible-regex-solution – Tofandel May 07 '14 at 16:57
  • I'm not a Java guy, but wouldn't `string.split()` be slower than a loop? – Amal Murali May 07 '14 at 17:49
  • @AmalMurali would be less readable too. I don't know about you but reading this regex `(?<=(.))(?!\\1)` is going to make me scratch my head. – Cruncher May 07 '14 at 19:04
  • This is trivially done in Haskell: `group "aaaabbbccccaaddddcfggghhhh"` returns the expected result `["aaaa","bbb","cccc","aa","dddd","c","f","ggg","hhhh"]`... – Bakuriu May 07 '14 at 20:50
  • 1
    Possible duplicate of [Split regex to extract Strings of contiguous characters](http://stackoverflow.com/questions/13596454/split-regex-to-extract-strings-of-contiguous-characters) – maxxyme Jan 12 '17 at 08:50

3 Answers3

31

Try this:

String   str = "aaaabbbccccaaddddcfggghhhh";
String[] out = str.split("(?<=(.))(?!\\1)");

System.out.println(Arrays.toString(out));
=> [aaaa, bbb, cccc, aa, dddd, c, f, ggg, hhhh]

Explanation: we want to split the string at groups of same chars, so we need to find out the "boundary" between each group. I'm using Java's syntax for positive look-behind to pick the previous char and then a negative look-ahead with a back reference to verify that the next char is not the same as the previous one. No characters were actually consumed, because only two look-around assertions were used (that is, the regular expresion is zero-width).

Óscar López
  • 232,561
  • 37
  • 312
  • 386
5

What about capturing in a lookbehind?

(?<=(.))(?!\1|$)

as a Java string:

(?<=(.))(?!\\1|$)
Jonny 5
  • 12,171
  • 2
  • 25
  • 42
1

here I am taking each character and Checking two conditions in the if loop i.e String can't exceed the length and if next character is not equaled to the first character continue the for loop else take new line and print it.

for (int i = 0; i < arr.length; i++) {
    char chr= arr[i];
    System.out.print(chr);
    if (i + 1 < arr.length && arr[i + 1] != chr) {
        System.out.print(" \n");
    }
}
Py-Coder
  • 2,024
  • 1
  • 22
  • 28