1

I have a question about the performance of Java's String.indexOf(String subString).

I've written a class to compare the performance of calling String.indexOf(String subString) compare to copying the source from inside String's source and calling the internal indexOf() using exactly the same arguments.

The performance appears to be about 4x better when calling String.indexOf() directly, despite the call stack would be 2 frames deeper.

My JVM is JDK1.7.0_40 64bit (windows hotspot). My machine is running Windows with i7-4600U CPU with 16GB ram.

Here is the code:

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;

public class TestIndexOf implements Runnable {

  final static String s0 = "This is my search string, it is pretty long so can test the speed of the search";
  final static String s1 = "speed of the search";
  final static char[] c0 = s0.toCharArray();
  final static char[] c1 = s1.toCharArray();
  final static byte[] b0 = s0.getBytes();
  final static byte[] b1 = s1.getBytes();

  static AtomicBoolean EXIT = new AtomicBoolean(false);
  static AtomicLong TOTAL = new AtomicLong(0);

  @Override
  public void run() {
    long count = 0;
    try {
      for (;;) {
        // Case 1, search as byte[]
        int idx = indexOf(b0, 0, b0.length, b1, 0, b1.length, 0);
        // Case 2, search as char[]
        // int idx = indexOf(c0, 0, c0.length, c1, 0, c1.length, 0);
        // Case 3, search as String (using String.indexOf())
        // int idx = s0.indexOf(s1);
        if (idx >= 0) {
          count ++;
        }
        if (EXIT.get()) {
          break;
        }
      }
      TOTAL.addAndGet(count);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }

  /* byte version of indexOf, modified from Java JDK source */
  static int indexOf(byte[] source, int sourceOffset, int sourceCount,
      byte[] target, int targetOffset, int targetCount,
      int fromIndex) {
    if (fromIndex >= sourceCount) {
      return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
      fromIndex = 0;
    }
    if (targetCount == 0) {
      return fromIndex;
    }

    byte first  = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
      /* Look for first character. */
      if (source[i] != first) {
        while (++i <= max && source[i] != first) {
          ;
        }
      }

      /* Found first character, now look at the rest of v2 */
      if (i <= max) {
        int j = i + 1;
        int end = j + targetCount - 1;
        for (int k = targetOffset + 1; j < end && source[j] ==
            target[k]; j++, k++) {
          ;
        }

        if (j == end) {
          /* Found whole string. */
          return i - sourceOffset;
        }
      }
    }
    return -1;
  }

  /* char version of indexOf, directly copied from JDK's String class */
  static int indexOf(char[] source, int sourceOffset, int sourceCount,
      char[] target, int targetOffset, int targetCount,
      int fromIndex) {
    if (fromIndex >= sourceCount) {
      return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
      fromIndex = 0;
    }
    if (targetCount == 0) {
      return fromIndex;
    }

    char first  = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
      /* Look for first character. */
      if (source[i] != first) {
        while (++i <= max && source[i] != first) {
          ;
        }
      }

      /* Found first character, now look at the rest of v2 */
      if (i <= max) {
        int j = i + 1;
        int end = j + targetCount - 1;
        for (int k = targetOffset + 1; j < end && source[j] ==
            target[k]; j++, k++) {
          ;
        }

        if (j == end) {
          /* Found whole string. */
          return i - sourceOffset;
        }
      }
    }
    return -1;
  }


  public static void main(String[] args) throws Exception {
    int threads = 4;
    ExecutorService executorService = Executors.newFixedThreadPool(threads);
    for(int i=0; i<threads; i++) {
      executorService.execute(new TestIndexOf());
    }
    Thread.sleep(10000);
    EXIT.set(true);
    System.out.println("STOPPED");
    Thread.sleep(1000);
    System.out.println("Count = " + TOTAL.get());
    System.exit(0);
  }
}

The results I got was: (2 samples, ran for 10 seconds, with 4 threads)

byte[] 224848726 225011695

char[] 224707442 224707442

String 898161092 897897572

What is the magic with String.indexOf()? Does this get hardware acceleration? :P

javaPhobic
  • 476
  • 4
  • 12
  • First guess would be benchmarking shenanigans. Second would be special treatment of `String`s -- I've heard that the JVM (or some of them, at least) do special things with `String`s, so it wouldn't surprise me that much. I really don't know at a quick glance though. – awksp Jun 17 '14 at 05:23
  • Continuing what user3580294 said, although the Java library source is available for reference, there is no guarantee that it matches the actual binary implementation. My first-order guess would be that the String class uses many more native routines than it lets on. – Mad Physicist Jun 17 '14 at 05:37
  • True - but decompiling the java.lang.String in rt.jar in the JDK tends to show this is really implemented in Java - at least for the indexOf() method. I cannot see any native calls in the path, unlike other classes in rt.jar. – javaPhobic Jun 17 '14 at 05:56
  • The little I've heard is that the JVM (or at least one of them) hardcodes some expectations about the `String.class` file it loads, so that if you try to insert a function in the middle you'll end up with JVM crashes. The performance difference *could* be related to that, but I'm not sure at all. – awksp Jun 17 '14 at 05:58
  • Multithreading is surely not good for benchmarking. Another problem: You're missing any warmup (letting the JVM optimize and compile the code before measurement). "**[Doing it that way, you could very well let your cat estimate the results](http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking)**". – maaartinus Jun 17 '14 at 06:13
  • Can you elaborate on why multithreading is not a good way for benchmarking? I've tried running with single thread over 5 minutes but results are similar. Thanks. – javaPhobic Jun 17 '14 at 06:44
  • Actually, you;d better start a new JVM instance for each test, as mere loading a class can influence performance (see the influence of `preload` in this [benchmark of mine](http://stackoverflow.com/q/24222991/581205)). Multithreading makes it all even more non-deterministic. Sometimes, it may be fine, sometimes not... and you can be never sure. So I (and all the benchmarking guys) prefer waiting longer to risking wasting *own* time with possible nonsense results. – maaartinus Jun 17 '14 at 09:53

1 Answers1

5

JVM has specific optimizations for some methods from standard library. One of them will replace calls to String.indexOf with efficient inline assembly. It can even take advantage of SSE4.2 instructions. This is, most likely, causing this difference.

For details see: src/share/vm/opto/library_call.cpp

Piotr Praszmo
  • 17,928
  • 1
  • 57
  • 65
  • this is the correct answer, it's called 'intrinsics' - some of the jdk methods may have way better impl. than the source code suggests. On a flip note: there have been some bugs regarding SSE impl. of String class. – bestsss Jun 22 '14 at 11:26