1

I am writing an application that relies heavily on separating large strings into individual words. Because I have to deal with so many strings I am concerned about efficiency. I am using String.split to do this but I do not know if there is a more efficient way to accomplish this.

private static String[] printWords(String input) {
        String splitWords[] = input.split(" ");
        return splitWords;
    }
user3282276
  • 3,674
  • 8
  • 32
  • 48
  • 6
    I strongly doubt whether there's any way of coding `String.split` that's more efficient than `String.split` itself. Those Oracle guys know what they're doing. – Dawood ibn Kareem Feb 21 '14 at 03:41
  • you can try regex to find words instead – ManZzup Feb 21 '14 at 03:42
  • By "efficiency", do you mean time or space? – Code-Apprentice Feb 21 '14 at 03:42
  • 4
    I agree with @DavidWallace, but would add that you shouldn't be asking this question unless you have an actual *problem*, backed up with experimental data showing that this code is a bottle neck. – Bohemian Feb 21 '14 at 03:43
  • 1
    `String.split()` is very likely the best you can do. That said, if you find that after setting customer-focused performance goals you find that your program is a) not meeting those goals, and b) the main reason for that is `String.split()`, then you may want to take a look at alternative approaches. For example, splitting a string is likely going to be parallelizable: split the string near the 1/4, 1/2 and 3/4 points (easy to do given that you know the length) and then farm the work out amongst a number of threads. – dlev Feb 21 '14 at 03:52
  • 1
    Farming work out to threads will not always speed it up, unless the work involves wait states. You can try it, but naive attempts often run slower rather than faster. – keshlam Feb 21 '14 at 04:10
  • @DavidWallace: I strongly disagree. There's no way to handle all special cases efficiently, see my answer. – maaartinus Feb 21 '14 at 05:35

2 Answers2

1

When I timed it a few years ago, (Java 6) String.split() was significantly slower than searching for individual space characters with indexOf(), cause the former has a lot of regex baggage.

If your sentences always split on a space, (somewhat dubious?) and that performance is truly an issue (do some real tests), custom code would be faster.

Following the link provided in David Ehrmann's comment, looks like Java 7 made some speedups. My tests were with Java 6.

user949300
  • 15,364
  • 7
  • 35
  • 66
1

While the Sun/Oracle guys did a decent job in general, there's still room for improvement, especially because you can specialize for your concrete problem. Sometimes, you can hit a case when a huge speedup factor is achievable, when you don't rely on the JITC to do all the job perfectly out of the box. Such cases are rare, but exist.

For example String.split calls Pattern.compile for the general case and then a precomputed Pattern is a sure a win.

There's an optimization for single char patterns avoiding the regex overhead, so the possible gain is limited. Still, I'd try Guava's Splitter and a hand-crafted solution, if performance is really important.

Probably you'll find out that splitting on space is not what you want and then the gain will be bigger.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • [String is apparently optimized for single character split](http://www.docjar.com/html/api/java/lang/String.java.html#2311). My concern would be with the array size and the substring overhead, especially with substring being O(n) on some Java versions, now. – David Ehrmann Feb 21 '14 at 05:48