1

My Java webservice's input is a comma separated list of strings ("ABC1,ABCD2,A1,A234B456,C1").

If my split threshold is 2 then I need to split it as

ABC1,ABCD2
A1,A234B456
C1

If my split threshold is 3 then I need to split it as
ABC1,ABCD2,A1
A234B456,C1

I'm trying to figure out a Java regex way of doing this. I tried checking out StringUtils API but no luck.

Gnana
  • 614
  • 1
  • 7
  • 18
  • 5
    The simplest solution would be to just use string split and to rejoin the elements in the resulting array based on your threshold. While technically a bit slower than writing it completely from scratch, it is a lot more readable and it is still in the same complexity class. As you asked for a regex answer this is a comment, but I recommend doing it this way – LionC Aug 21 '14 at 14:12
  • It may be better to loop over the string and have a comma counter instead of using regex, though in theory if you constructed the regex string dynamically it could work. – RevanProdigalKnight Aug 21 '14 at 14:12
  • Thanks. If my input contains thousands of comma separated strings, would this be an optimal solution? Yes, I tried to construct my regex dynamically but it didnt work for all cases. I checked this stackoverflow article: http://stackoverflow.com/questions/17892284/split-only-after-comma-3-times-appear-in-java – Gnana Aug 21 '14 at 14:15
  • 1
    Why do you want to do this? Feels a little like an XY problem. – Duncan Jones Aug 21 '14 at 14:20
  • Yea, don't use regex for this. Using `.split()` and a for-loop will be more maintainable and offer a broader range of solutions (so you change your threshold to any number you want). Any regex will limit your threshold to only a few values. – skamazin Aug 21 '14 at 14:25
  • @Gnana As I said, using split and then rejoining elements is in the same complexity class as iterating yourself, thus it scales aswell. Id suggest trying it, it is pretty short code – LionC Aug 21 '14 at 14:26
  • Summary of other comments: If readability is important to you, then use split. If efficiency is important to you, then manually implement the parser. In either case, regex is probably not a good idea. – Cruncher Aug 21 '14 at 14:38
  • Thanks everyone. Let me discuss with the team about the regex and non-regex solutions. – Gnana Aug 21 '14 at 15:00
  • I did some metrics for a comma separated list of 1000 items. Regex is slower than for-loop. I have decided to keep away from Regex – Gnana Aug 28 '14 at 16:36

4 Answers4

2

You can use a regex like this:

((?:[^,]*,[^,]*|[^,]+){2})(?:,|$)

Where number 2 is threshold - 1

RegEx Demo1

RegEx Demo2

OUTPUT:

When threshold is 3:

ABC1,ABCD2,A1
A234B456,C1

When threshold is 2:

ABC1,ABCD2
A1,A234B456
C1

CODE:

int threshold = 3;
String str = "piid1,piid2,piid3,piid4,piid5";
Pattern p = Pattern.compile("((?:[^,]*,[^,]*|[^,]+){" + (threshold-1) + "})(?:,|$)");
Matcher m  = p.matcher(str);
while (m.find()) {
    System.out.println(m.group(1));
}

Output:

piid1,piid2,piid3
piid4,piid5
Community
  • 1
  • 1
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • String list = "piid1,piid2,piid3,piid4,piid5"; String[] listarr = list.split("((?:[^,]*,[^,]*|[^,]+){2})(?:,|$)"); I tried this, it doesn't work – Gnana Aug 21 '14 at 14:26
  • Also, I tried this in ((?:[^,]*,[^,]*|[^,]+){3})(?:,|$) in regex101. It is not splitting as expected – Gnana Aug 21 '14 at 14:27
  • No you cannot use this in `String#split` method. You will need to use `Pattern` and `Matcher` APIs to get matches out of it using `matcher.find()` – anubhava Aug 21 '14 at 14:28
  • @Gnana: I have provided your code also in my answer. – anubhava Aug 21 '14 at 14:38
  • 1
    it works like a charm. Thanks. Let me try to propose regex and non-regex options to my team. – Gnana Aug 21 '14 at 15:03
  • Glad to know it worked out, can you mark the answer as accepted by clicking on tick mark on top-left of my answer. – anubhava Aug 21 '14 at 15:26
0

This does the job for threshold 2

(?:,[^,]*){1}(,)

Regular expression visualization

Debuggex Demo

For a threshold 3 change the {1} into {2}, so it's always the desired threshold - 1

asontu
  • 4,548
  • 1
  • 21
  • 29
0

This one is non-reg code,

List<String> result = new ArrayList<String>();

String temp = "";

for (int i = 0; i < splits.length; i++) {                         

    if (threshold > i%threshold) 
        temp = temp + (temp.length() > 0 ? "," : "") + splits[i];

    if (threshold - 1 == i%threshold) {
        result.add(temp);
        temp = "";
    }                                         
}

if (temp != "")  //in the case of threshold == split length 
    result.add(temp); //to add last temp

for (String string : result) {
    System.out.println(string);
}
Wundwin Born
  • 3,467
  • 19
  • 37
0

I tried this for non-regex

if (threshold != 0) { List<String> result = new ArrayList<String>();
StringBuilder sb = new StringBuilder(); int len = listarr.length; for(int i = 1; i <= listarr.length; i++) { if (i % threshold != 0) { sb.append(listarr[i - 1]); if ( i != len) { sb.append(","); } } else { sb.append(listarr[i - 1]); result.add(sb.toString()); sb.setLength(0); } } result.add(sb.toString()); for (String string : result) { System.out.println(string); }
}

Gnana
  • 614
  • 1
  • 7
  • 18