21

I was just looking through the implementation of Java's String class and the following struck me as odd:

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {               // Here n is being decremented...
                if (v1[i] != v2[i])
                    return false;
                i++;                         // while i is being incremented
            }
            return true;
        }
    }
    return false;
}

This could easily be implemented with just one counting variable while n would be effectively final, like this:

while (i != n) {
    if (v1[i] != v2[i])
        return false;
    i++;
}

Theres also this, which gets rid of the i entirely:

while (n-- != 0) {
    if (v1[n] != v2[n])
        return false;
}

Does it have to do with comparison to 0 being (a miniscule bit) cheaper than to another variable or is there any other particular reason as to why it is implemented that way?

Marv
  • 3,517
  • 2
  • 22
  • 47
  • 10
    *reverse array traversal is (in comparison) much slower than forward traversal:* Where did you hear that? – Tunaki Mar 14 '16 at 16:59
  • 4
    You proably mean reverese List Iteration is slower. This is an array and the iteration and should not be slower. – AlexWien Mar 14 '16 at 17:00
  • Huh, I feel like I've heard that somewhere. Maybe I've got it confused with traversal of a 2D array, where your inner loop loops through the outer array. My bad, I'll edit it. – Marv Mar 14 '16 at 17:00
  • [It was stranger back in Java 6.](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String.equals%28java.lang.Object%29) But this is still curious; I see nothing to suggest why one would want to implement it in this manner, since it's definitely not a constant-time string comparison. – Makoto Mar 14 '16 at 17:02
  • And if you copy src code, which is copyrighted, please provide the original link in the post, – AlexWien Mar 14 '16 at 17:03
  • It only seems to have been this *specific* way since [Java 7u40-b43](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/lang/String.java#String.equals%28java.lang.Object%29) (in OpenJDKs versions on grepcode, at least). – Andy Turner Mar 14 '16 at 17:03
  • Related post for the Java < 8 implementation http://stackoverflow.com/q/12661335/1743880 – Tunaki Mar 14 '16 at 17:03
  • JDK 7u71 doesn't have the same implementation. Which implementation are you looking at? – sp00m Mar 14 '16 at 17:04
  • @sp00m this is JDK 8u73 – Marv Mar 14 '16 at 17:15
  • two counters used on Oracle 8u45 as well – beresfordt Mar 14 '16 at 17:18
  • 3
    It's interesting that [`Arrays.equals(char[], char[])`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/Arrays.java#Arrays.equals%28char%5B%5D%2Cchar%5B%5D%29) doesn't use this idiom. – Andy Turner Mar 14 '16 at 17:20
  • 1
    One can always microbanchmark `String.equals(String)` and `Arrays.equals(char[], char[])` and see what would happen – Antoniossss Mar 14 '16 at 17:23
  • @Marv ah yes. Well I think then maybe maybe this has to do with guarding against a multithreaded program not implemented correctly. So that if `anotherString` is being accessed and edited multiple places at the same time this failure will happen right away when the user changes the length of `anotherString` from 200 to 100 and goes to compare index 150 instead of 50 (with 150 being null now) will fail right away instead of having to iterate 100 more times. – jgr208 Mar 14 '16 at 17:33
  • 3
    @jgr208 String is immutable – beresfordt Mar 14 '16 at 17:34
  • 4
    @jgr208 you could also knacker about with Unsafe to mutate a String. In any normal use String is immutable – beresfordt Mar 14 '16 at 17:40
  • @beresfordt you have broken the first rule of programming in that case, assume everyone is an idiot and will do something stupid. – jgr208 Mar 14 '16 at 17:42
  • 5
    @jgr208 String is always immutable, and if you FORCE it to be mutable in non-ment-to-be-doing-that-way then it is your responsibility to include multithreading into your haxing application. I doubt that this is done because of "making string thrade safe even when it should not be" – Antoniossss Mar 14 '16 at 17:43
  • @Antoniossss so a question that ask our opinion on why something is done can't have a thinking outside the box answer? – jgr208 Mar 14 '16 at 17:45
  • 5
    @jgr208 sure, but the thing you have mentioned is just not the case here, obviously IMHO. – Antoniossss Mar 14 '16 at 17:48

3 Answers3

10

I think it has to to with the substring implementation prior to JDK 7.

At the time, the underlying character array size was not necessarily the string size. There were two fields, offset & count (i an j respectively) that were showing where is this string in the underlying array, so the method was :

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
    }
    return false;
}

After the change mentioned above, this method had to be also changed, so they just fixed n to be now the array length:

int n = value.length;

and got rid of j (because there's no offset anymore):

int i = 0;

Now because i has to be incremented after 2 usages, it is incremented in a separate statement :

if (v1[i] != v2[i])
    return false;
i++;

I think that's it, there's a more concise implementation which is evident if it's to write it from scratch but given that it was a change pushed by another change ... Oracle people are ordinary people just like us :)

Benchmarking

As for benchmarking String#equals vs Arrays#equals(char[], char[]) I think we have to compare apples with apples, so I've put the two approaches for comparing 2 character arrays together :

public static void main(String[] args) {
    final Random random = new Random();
    final int arrays = 10000;
    final int chars = 1000;
    // generate the arrays
    char[][] array = new char[arrays][chars];
    for (int i = 0; i < arrays; i++) {
        for (int j = 0; j < chars; j++) {
            array[i][j] = (char)(random.nextInt(94) + 33);
        }
    }
    // compare using Arrays equals
    long before = System.nanoTime();
    for (int i = 0; i < arrays; i++) {
        for (int j = 0; j < chars; j++) {
            equals_Arrays(array[i], array[j]);
        }
    }
    System.out.println(System.nanoTime() - before);
    // compare using String equals
    before = System.nanoTime();
    for (int i = 0; i < arrays; i++) {
        for (int j = 0; j < chars; j++) {
            equals_String(array[i], array[j]);
        }
    }
    System.out.println(System.nanoTime() - before);
}

private static boolean equals_Arrays(char[] a, char[] a2) {
    if (a == a2)
        return true;
    if (a == null || a2 == null)
        return false;

    int length = a.length;
    if (a2.length != length)
        return false;

    for (int i = 0; i < length; i++)
        if (a[i] != a2[i])
            return false;

    return true;
}

private static boolean equals_String(char[] v1, char[] v2) {
    if (v1 == v2)
        return true;
    if (v1 == null || v2 == null)
        return false;

    int length = v1.length;
    if (length == v2.length) {
        int i = 0;
        while (length-- != 0) {
            if (v1[i] != v2[i])
                return false;
            i++;
        }
        return true;
    }
    return false;
}

On my box I see no notable difference.

Bax
  • 4,260
  • 5
  • 43
  • 65
5

It is not real code for most of real Java VM with HotSpot. String.equals is very important method and it is implemented through intrinsic. It has platform specific native implementation. You can find full list here src/share/vm/classfile/vmSymbols.hpp (see do_intrinsic)

sibnick
  • 3,995
  • 20
  • 20
  • 1
    Nice catch, but why it is still stower than comparing 2 char arrays?? (see code below) I must admit, that string comparision is rather common in programming, so native approach sounds reasonable (better performance) but in the real world, looks like it is not that fast (as it could be) – Antoniossss Mar 14 '16 at 17:54
  • 1
    First of all Arrays.equals is intrinsic too. I suppose we can see time difference between them because Arrays.equals is static method and String.equals is object method. – sibnick Mar 14 '16 at 18:12
2

This is NOT AN ANSWER, justa pseudo banchmark:

public class CompString {

    public static void main(String[] args) {

        DecimalFormat df = new DecimalFormat("0.#################");
        int count = 1000000;
        int length = 20;
        int runs = 10;

        String[] strings = new String[count];
        String[] copiedStrings = new String[count];
        char[][] chars = new char[count][];
        char[][] copiedChars = new char[count][];

        for (int i = 0; i < count; i++) {
            String str = RandomStringUtils.random(length);

            strings[i] = new String(str);
            copiedStrings[i] = new String(str);

            chars[i] = Arrays.copyOf(str.toCharArray(), str.length());
            copiedChars[i] = Arrays.copyOf(chars[i], chars[i].length);
        }

        System.out.println("Lets banchmark that !!");
        int loop = 0;
        while (loop++ < runs) {
            System.out.println("Run #" + loop);
            long t = 0;
            long t0 = System.currentTimeMillis();
            for (int i = 0; i < count; i++) {
                strings[i].equals(copiedStrings[i]);
            }
            long t1 = System.currentTimeMillis();
            t = t1 - t0;

            System.out.println("Avg String.equals() duration: " + df.format(t / (double) count));

            t0 = System.currentTimeMillis();
            for (int i = 0; i < count; i++) {
                Arrays.equals(chars[i], copiedChars[i]);
            }
            t1 = System.currentTimeMillis();
            t = t1 - t0;

            System.out.println("Avg Arrays.equals(char[] char[]) duration: " + df.format(t / (double) count));
            System.out.println();
        }
    }
}

And here are the results:

Run #1
Avg String.equals() duration: 0,000017
Avg Arrays.equals(char[] char[]) duration: 0,000013

Run #2
Avg String.equals() duration: 0,000037
Avg Arrays.equals(char[] char[]) duration: 0

Run #3
Avg String.equals() duration: 0,000002
Avg Arrays.equals(char[] char[]) duration: 0

Run #4
Avg String.equals() duration: 0,000003
Avg Arrays.equals(char[] char[]) duration: 0

Run #5
Avg String.equals() duration: 0,000002
Avg Arrays.equals(char[] char[]) duration: 0

Run #6
Avg String.equals() duration: 0,000003
Avg Arrays.equals(char[] char[]) duration: 0

Run #7
Avg String.equals() duration: 0,000003
Avg Arrays.equals(char[] char[]) duration: 0

Run #8
Avg String.equals() duration: 0,000003
Avg Arrays.equals(char[] char[]) duration: 0

Run #9
Avg String.equals() duration: 0,000002
Avg Arrays.equals(char[] char[]) duration: 0

Run #10
Avg String.equals() duration: 0,000002
Avg Arrays.equals(char[] char[]) duration: 0

So because String.equals takes more time then comparing underlying arrays (with single iterator variable) we can rule out some magic JVM optimization issues here (but could it be in the past?).

Marv
  • 3,517
  • 2
  • 22
  • 47
Antoniossss
  • 31,590
  • 6
  • 57
  • 99
  • I think for an actual comparison between the 2 approaches you have to write 2 methods with the same signature : `boolean equals2(char[] v1, char[] v2)`, copy one from Arrays and deduct the second from String equals. I did it on my local and the results are ~ the same. – Bax Mar 14 '16 at 18:21
  • 1
    And use System.nanoTime() – AlexWien Mar 14 '16 at 19:09