30

I use this regex to split a string at every say 3rd position:

String []thisCombo2 = thisCombo.split("(?<=\\G...)");

where the 3 dots after the G indicates every nth position to split. In this case, the 3 dots indicate every 3 positions. An example:

Input: String st = "123124125134135145234235245"
Output: 123 124 125 134 135 145 234 235 245.

My question is, how do i let the user control the number of positions where the string must be split at? In other words, how do I make those 3 dots, n dots controlled by the user?

Joey
  • 344,408
  • 85
  • 689
  • 683
Emile Beukes
  • 413
  • 1
  • 5
  • 7
  • 7
    Isn't it better to just use substring in a loop? – Aske B. Sep 06 '12 at 08:17
  • 1
    Related: [Split string to equal length substrings in Java](http://stackoverflow.com/questions/3760152/split-string-to-equal-length-substrings-in-java), [Splitting a string at every n-th character](http://stackoverflow.com/questions/2297347/splitting-a-string-at-every-n-th-character), [Java: How to split a string by a number of characters?](http://stackoverflow.com/questions/9276639/java-how-to-split-a-string-by-a-number-of-characters) – Aske B. Sep 06 '12 at 21:48

5 Answers5

47

For a big performance improvement, an alternative would be to use substring() in a loop:

public String[] splitStringEvery(String s, int interval) {
    int arrayLength = (int) Math.ceil(((s.length() / (double)interval)));
    String[] result = new String[arrayLength];

    int j = 0;
    int lastIndex = result.length - 1;
    for (int i = 0; i < lastIndex; i++) {
        result[i] = s.substring(j, j + interval);
        j += interval;
    } //Add the last bit
    result[lastIndex] = s.substring(j);

    return result;
}

Example:

Input:  String st = "1231241251341351452342352456"
Output: 123 124 125 134 135 145 234 235 245 6.

It's not as short as stevevls' solution, but it's way more efficient (see below) and I think it would be easier to adjust in the future, of course depending on your situation.


Performance tests (Java 7u45)

2,000 characters long string - interval is 3.

split("(?<=\\G.{" + count + "})") performance (in miliseconds):

7, 7, 5, 5, 4, 3, 3, 2, 2, 2

splitStringEvery() (substring()) performance (in miliseconds):

2, 0, 0, 0, 0, 1, 0, 1, 0, 0

2,000,000 characters long string - interval is 3.

split() performance (in miliseconds):

207, 95, 376, 87, 97, 83, 83, 82, 81, 83

splitStringEvery() performance (in miliseconds):

44, 20, 13, 24, 13, 26, 12, 38, 12, 13

2,000,000 characters long string - interval is 30.

split() performance (in miliseconds):

103, 61, 41, 55, 43, 44, 49, 47, 47, 45

splitStringEvery() performance (in miliseconds):

7, 7, 2, 5, 1, 3, 4, 4, 2, 1

Conclusion:

The splitStringEvery() method is a lot faster (even after the changes in Java 7u6), and it escalates when the intervals become higher.

Ready-to-use Test Code:

pastebin.com/QMPgLbG9

Community
  • 1
  • 1
Aske B.
  • 6,419
  • 8
  • 35
  • 62
  • 5
    Isn't this just premature optimization? – thedayturns Sep 06 '12 at 23:00
  • 4
    @thedayturns Why are you posting that statement with a question mark? Don't be unsure of your accusations. It's one of those accusations that should be used against people who [waste their time with unnecessary performance improvements](http://programmers.stackexchange.com/a/79954/62391). Anyway, this is fastly written, ready-to-use code; easier to understand, to me at least; and on the plus side, it runs e.g. **60 times faster** in the last case (it grows *exponentially* with the interval). My whole performance research act may be unnecessary, but now it's there for generations to come. – Aske B. Sep 07 '12 at 06:58
  • Good response. I thought about it, and I think you're right - the highest voted answer is probably even more confusing than this one. On the other hand, the google guava solution is better than both you're fine with including another library. – thedayturns Sep 08 '12 at 07:35
  • 1
    @thedayturns If you mean "***if*** *you're fine with including another library*" then I agree. It's a very [elegant solution](http://stackoverflow.com/a/12295789/1380710), but I don't think it's the majority that wants to include an external library just for one functionality. – Aske B. Sep 08 '12 at 09:50
  • Yep. Caught my typo after the 5 minute deadline, whoops. – thedayturns Sep 08 '12 at 20:40
  • With the recent changes to substring's performance, I wonder if this is still fastest. Has anyone tried comparing these using Java 7 instead of Java 6? – Dennis Meng Jan 13 '14 at 05:29
  • @DennisMeng I just tested it out, using the test code I provided, and it has slightly different results. I'll update the results to the answer. Regardless, I would be surprised if the substring would ever become bad enough to match using regex. – Aske B. Jan 13 '14 at 16:43
  • I think you should check that the input string is non-empty - otherwise you will access `result[-1]` and get an `ArrayIndexOutOfBoundsException` on the "add the last bit" line for an empty string.. – Zout Apr 11 '16 at 18:59
  • @Zout You are right. You could also get a `NullPointerException` if the string is null. Probably also some weird behavior if the `interval` is `0` or negative. I think this is far beyond what the OP asked though. Defensive programming can be good in some circumstances, but it's not necessary in most cases. Hopefully people will figure out what they need in their own case. Or seek the knowledge about how to handle this in respective questions. – Aske B. Apr 12 '16 at 07:10
  • In my use case, the string was coming from user input, but the interval was predefined (so we can guarantee the string is non-null and that the interval is > 0) so I think it would make sense to check for isEmpty. I can imagine other situations where the interval would also be user defined, so I see your point. – Zout Apr 12 '16 at 12:31
29

You can use the brace operator to specify the number of times a character must occur:

String []thisCombo2 = thisCombo.split("(?<=\\G.{" + count + "})");

The brace is a handy tool because you can use it to specify either an exact count or ranges.

stevevls
  • 10,675
  • 1
  • 45
  • 50
20

Using Google Guava, you can use Splitter.fixedLength()

Returns a splitter that divides strings into pieces of the given length

Splitter.fixedLength(2).split("abcde");
// returns an iterable containing ["ab", "cd", "e"].
epoch
  • 16,396
  • 4
  • 43
  • 71
0

If you want to build that regex string you can set the split length as a parameter.

public String getRegex(int splitLength)
{
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < splitLength; i++)
        builder.append(".");

    return "(?<=\\G" + builder.toString() +")";
}
The Cat
  • 2,375
  • 6
  • 25
  • 37
0
private String[] StringSpliter(String OriginalString) {
    String newString = "";
    for (String s: OriginalString.split("(?<=\\G.{"nth position"})")) { 
        if(s.length()<3)
            newString += s +"/";
        else
            newString += StringSpliter(s) ;
    }
    return newString.split("/");
}
newMaziar
  • 41
  • 1