8

I have a string that looks like this:

0,0,1,2,4,5,3,4,6

What I want returned is a String[] that was split after every 3rd comma, so the result would look like this:

[ "0,0,1", "2,4,5", "3,4,6" ]

I have found similar functions but they don't split at n-th amount of commas.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
knurb
  • 475
  • 1
  • 4
  • 14
  • 4
    Have you tried writing a function yourself to parse/split it? – Jashaszun Jul 26 '13 at 22:56
  • One approach that might be useful is to first change `0,0,1,2,4,5,3,4,6` into `0,0,1|2,4,5|3,4,6` which is a fairly simple regular expression replace translation. Or, just use a Matcher directly and walk it incrementally [as shown here](http://stackoverflow.com/a/1277171/2246674). – user2246674 Jul 26 '13 at 22:57
  • Two ways I can think of: use `indexOf` in a while loop or split on `,` and then glue the results back together again in groups of three. – flup Jul 26 '13 at 23:01

4 Answers4

23

NOTE: while solution using split may work (last test on Java 17) it is based on bug since look-ahead in Java should have obvious maximum length. This limitation should theoretically prevent us from using + but somehow \G at start lets us use + here. In the future this bug may be fixed which means that split will stop working.

Safer approach would be using Matcher#find like

String data = "0,0,1,2,4,5,3,4,6";
Pattern p = Pattern.compile("\\d+,\\d+,\\d+");//no look-ahead needed
Matcher m = p.matcher(data);
List<String> parts = new ArrayList<>();
while(m.find()){
    parts.add(m.group());
}
String[] result = parts.toArray(new String[0]);

You can try to use split method with (?<=\\G\\d+,\\d+,\\d+), regex

Demo

String data = "0,0,1,2,4,5,3,4,6";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+),"); //Magic :) 
// to reveal magic see explanation below answer
for(String s : array){
    System.out.println(s);
}

output:

0,0,1
2,4,5
3,4,6

Explanation

  • \\d means one digit, same as [0-9], like 0 or 3
  • \\d+ means one or more digits like 1 or 23
  • \\d+, means one or more digits with comma after it, like 1, or 234,
  • \\d+,\\d+,\\d+ will accept three numbers with commas between them like 12,3,456
  • \\G means last match, or if there is none (in case of first usage) start of the string
  • (?<=...), is positive look-behind which will match comma , that has also some string described in (?<=...) before it
  • (?<=\\G\\d+,\\d+,\\d+), so will try to find comma that has three numbers before it, and these numbers have aether start of the string before it (like ^0,0,1 in your example) or previously matched comma, like 2,4,5 and 3,4,6.

Also in case you want to use other characters then digits you can also use other set of characters like

  • \\w which will match alphabetic characters, digits and _
  • \\S everything that is not white space
  • [^,] everything that is not comma
  • ... and so on. More info in Pattern documentation

By the way, this form will work with split on every 3rd, 5th, 7th, (and other odd numbers) comma, like split("(?<=\\G\\w+,\\w+,\\w+,\\w+,\\w+),") will split on every 5th comma.

To split on every 2nd, 4th, 6th, 8th (and rest of even numbers) comma you will need to replace + with {1,maxLengthOfNumber} like split("(?<=\\G\\w{1,3},\\w{1,3},\\w{1,3},\\w{1,3}),") to split on every 4th comma when numbers can have max 3 digits (0, 00, 12, 000, 123, 412, 999).

To split on every 2nd comma you can also use this regex split("(?<!\\G\\d+),") based on my previous answer

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • 3
    You could also replace \\d+ with [^,]* to make it work with anything that's not a comma. So it would work with "a,b,c,f,g,h,x,y,z" – agbinfo Jul 26 '13 at 23:41
  • @agbinfo Yes, true, but since OP was asking about digits I used `\\d`. Anyway nice additional info, will include it to answer. – Pshemo Jul 26 '13 at 23:50
  • @Pshemo Also, you may not realize this but a number of reputable sources say that you **cannot** do this kind of infinite lookbehind in Java... Only some finite form of variable lookbehind... So as a regex fan, this answer definitely deserves an upvote. See for instance, by no less than Jan Goyvaerts, [Java takes things a step further by allowing finite repetition. You still cannot use the star or plus](http://www.regular-expressions.info/lookaround.html). In fact even the dot-star or dot-plus, seem fine. Maybe a new Java version story (already there in Java 7). – zx81 Jun 10 '14 at 20:43
  • what if i want to split values on interval of 20th comma or lets say if that value is dynamic.can't we use some variable to put that nth number ? – b22 Feb 23 '16 at 11:14
  • @b22 "on interval of 20th comma" then answer should explain it (if it is not clear could you point to part which confuses you?). "or lets say if that value is dynamic" it depends on what you think about dynamic. You can't change how regex works after you started using it, but you can use dynamic values while building it. If you are looking for something like `.split("(?<=\\G\\d{1,100}(,\\d{1,100}){"+n+"}),")` then unfortunately this will not work (it is hard to tell why regex can't figure out maximum length here since `n` will represent existing value). – Pshemo Feb 23 '16 at 17:44
  • @b22 I suspect that your best shot may be using this answer: http://stackoverflow.com/a/17892708/1393766 – Pshemo Feb 23 '16 at 17:44
8

Obligatory Guava answer:

String input = "0,0,1,2,4,5,3,4,6";
String delimiter = ",";
int partitionSize = 3;

for (Iterable<String> iterable : Iterables.partition(Splitter.on(delimiter).split(s), partitionSize)) {
    System.out.println(Joiner.on(delimiter).join(iterable));
}

Outputs:

0,0,1
2,4,5
3,4,6
dnault
  • 8,340
  • 1
  • 34
  • 53
6

Try something like the below:

public String[] mySplitIntoThree(String str) 
{
    String[] parts = str.split(",");

    List<String> strList = new ArrayList<String>();

    for(int x = 0; x < parts.length - 2; x = x+3) 
    {
        String tmpStr = parts[x] + "," + parts[x+1] + "," + parts[x+2];

        strList.add(tmpStr);
    }

    return strList.toArray(new String[strList.size()]);
}

(You may need to import java.util.ArrayList and java.util.List)

Adam Knights
  • 2,141
  • 1
  • 25
  • 48
3

Nice one for the coding dojo! Here's my good old-fashioned C-style answer:

If we call the bits between commas 'parts', and the results that get split off 'substrings' then:

n is the amount of parts found so far, i is the start of the next part, startIndex the start of the current substring

Iterate over the parts, every third part: chop off a substring.

Add the leftover part at the end to the result when you run out of commas.

List<String> result = new ArrayList<String>();
int startIndex = 0;
int n = 0;
for (int i = x.indexOf(',') + 1; i > 0; i = x.indexOf(',', i) + 1, n++) {
    if (n % 3 == 2) {
        result.add(x.substring(startIndex, i - 1));
        startIndex = i;
    }
}
result.add(x.substring(startIndex));
flup
  • 26,937
  • 7
  • 52
  • 74