31

What's the quickest to compare two strings in Java?

Is there something faster than equals?

EDIT: I can not help much to clarify the problem.

I have two String which are sorted alphabetically and EXACTLY the same size

Example: abbcee and abcdee

Strings can be long up to 30 characters

Mediator
  • 14,951
  • 35
  • 113
  • 191
  • 12
    Why would `equals()` be slow for you? – BoltClock Sep 27 '10 at 16:01
  • 6
    Have you profiled your app, and was the conclusion that the hotspot in your code was caused by `String.equals(...)`? If you haven't profiled your app, why do you think `String.equals(...)` is (or could be) a problem? – Bart Kiers Sep 27 '10 at 16:04
  • 4
    His question does not state that equals is slow. Just wondering if there is something faster than equals(). – Sagar Sep 27 '10 at 16:05
  • 2
    his question does state that equals is slow (or at least that it's not fast) when he says 'or something faster than equals' – KevinDTimm Sep 27 '10 at 16:21
  • 1
    Agreed - as it stands, this is a bad question. If you want something faster than `equals()`, then **either** you have some very specific performance requirements backed by measurements (in which case these must be posted before any appropriate answers can be given), **or** you actually don't (have unusual performance requirements), in which case you should just use equals(). Implying "equals isn't fast enough" without any justification gives people nothing to work with. – Andrzej Doyle Sep 27 '10 at 16:55
  • @Andrzej Doyle Yes, I can not help much to clarify the problem. I have two lines which are sorted alphabetically and EXACTLY the same size Example: abbcee and abcdee Strings can be long up to 30 characters – Mediator Sep 27 '10 at 17:11
  • @simply denis, could you answer my questions? – Bart Kiers Sep 27 '10 at 17:34
  • @Bart, I Create app, but I'm looking for ways to optimize – Mediator Sep 27 '10 at 17:49
  • @simply denis, sorry, that does not answer my questions. Again: have you profiled your app, and was the conclusion that the hotspot in your code was caused by String.equals(...)? If you haven't profiled your app, why do you think String.equals(...) is (or could be) a problem? – Bart Kiers Sep 27 '10 at 17:49
  • @Bart, I do not understand you = (I do not know much Russian and English – Mediator Sep 27 '10 at 18:20
  • @simply denis, I asked about 'profiling', which is explained [here](http://tiny.cc/wkt15).I also mentioned 'hotspot', which is explained [here](http://tiny.cc/k6glr). My second question *"... why do you think String.equals(...) is (or could be) a problem?"* isn't that hard, right? – Bart Kiers Sep 27 '10 at 18:28
  • @Bart, I do not use special "profiling". I use common measurement of time. I do not think that "String.equals (...)" there is a problem, but I want to know is there any way better. – Mediator Sep 28 '10 at 06:35

7 Answers7

36

I don't expect that Sun Oracle hasn't already optimized the standard String#equals() to the max. So, I expect it to be already the quickest way. Peek a bit round in its source if you want to learn how they implemented it. Here's an extract:

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
    }
    return false;
}
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 1
    This looks pretty optimized to me.... it would be theoretically possible to optimize it further for the OP's specific constraints (e.g. using the knowledge that the strings are already of equal length, and higher likelihood of different characters in the middle of the string), but you obviously can't do that in practice because the class is final and the fields are private .... +1 for digging out the source! – mikera Sep 27 '10 at 18:26
  • I don't understand why they didn't compare the hashcode before doing a whole string compare. That would be faster. – Stephan Mar 24 '12 at 09:39
  • 10
    @Stephan: that would have been more inefficient. The `hashCode()` loops over all characters of the string to perform the calculation. If the `hashCode()` was after all not the same, then the `equals()` would basically need to loop over all characters *for the second time*. – BalusC Dec 21 '12 at 16:14
  • 6
    @BalusC That's only a part of the truth. Once calculated, the `hashCode()` method stores the hash code into an int, so the next comparison will be really fast. – Stephan Jan 04 '13 at 08:33
  • 2
    Please note that String::equals and many other String methods are intrinsics, replaced by the compiler with a pre-baked blob of assembley for the particular architecture. The Java code is only relevant before the intrinsic is in place, so hardly ever. – Nitsan Wakart Nov 09 '16 at 15:18
  • As @NitsanWakart pointed out, String equals is an intrinsic method. The mentioned example is also mentioned [here](https://shipilev.net/talks/joker-Oct2014-string-catechism.pdf), search for header (Equals: Implementation). Also benchmarks are provided. There is almost never a reason to provide your own implementation and no benchmarks are provided, proving your point, thus I down voted this answer. – u6f6o Dec 06 '16 at 10:32
  • why are they calling `anObject instanceof String` instead of `anObject.getClass() == String.class`? I thought that that is more efficient than `instanceof` and the String class is final... – xeruf Nov 18 '17 at 21:15
  • @Xerus instanceof returns false on null. – BalusC Nov 18 '17 at 21:17
  • what about `anObject != null && anObject.getClass() == String.class`? – xeruf Nov 18 '17 at 21:21
  • @Xerus instanceof is faster. – BalusC Nov 18 '17 at 22:15
28

Compare Strings of same length faster using the hashcode:

public static boolean equals(final String s1, final String s2) {
return s1 != null && s2 != null && s1.hashCode() == s2.hashCode()
    && s1.equals(s2);
}

You can test it, my results are for 4000000 compare operations including identical, equal and different strings:

String.equals(String):  177081939
equals(String, String):  44153608

Note: Calculating the hashCode of a new string object takes some computation time and afterwards the hashCode is stored in the object. So my suggested improvement will only be faster than the default compare if string objects are reused. In my application I am using String constants and store strings in collections. Multiple comparisons of strings using my method are actually faster for me, but it may not be in general.

If the method is used with new strings all the time like compare("a", "b"), it won't be an improvement.

So the fastest way of comparing strings depends on:

  • Whether your string objects are reused (like from a collection) or are always new (like from an input stream)
  • Whether your strings have different lengths
  • Whether your strings differ at the start or the end of the string
  • Your programming style, how much constants are used
  • Your use of String.intern()

Ignoring those facts, the majority of all programs will be fine with String.equals().

Stephan
  • 4,395
  • 3
  • 26
  • 49
  • +1 I've been using this for a lot of "word crunching" and the perf is fantastic – xchiltonx Dec 16 '13 at 22:08
  • 9
    I think it's worth mentioning that there might be some hashcode collisions, so there is a very, very slim chance that comparing hashes will return false positives. This explains the fact that you still have to use equals. For this reason, I think this will be slower if the majority of your strings are equal. – Nepoxx Oct 22 '14 at 18:56
  • 1
    Why did you [add "of some length"](http://stackoverflow.com/revisions/9850634/2)? – Flow Feb 27 '15 at 17:17
  • 2
    How is this faster? You still use `s1.equals(s2)`. – vedi0boy Dec 16 '15 at 17:05
  • I believe there would be hardly any performance improvement....for more info check http://stackoverflow.com/questions/14262431/why-does-the-equals-method-in-string-not-use-hash – Sumit Kumar Saha Jan 11 '16 at 14:50
  • @SumitKumarSaha Thanks for the link, you are partly right, so I updated my answer. I expect that people are using such optimizations if they know what they are doing. Programs that really need such a tuning will read, compute and write data. I assume that during the computation, the string data is stored in collections and compared to constants or each other, so there is an optimization. – Stephan Jan 13 '16 at 07:14
  • @Stephan I believe there you woud see optimizaion i.e inprovement in time when such comarision move to the order of 10^7 , i didn't see such in my application as there were comparsion to the order of around 10^4 – Sumit Kumar Saha Jan 13 '16 at 12:23
5

I had tries different combinations for string comparison (code here):

1. s1.equals(s2)
2. s1.length() == s2.length() && s1.hashCode() == s2.hashCode() && s1.equals(s2)
3. s1.hashCode() == s2.hashCode() && s1.equals(s2);
4. s1.length() == s2.length() && s1.equals(s2);

I used strings of 40 chars length, in 10000000000L iterations and before any iteration I reinitialized the strings.

for equal stings I got:

equal: 2873 milis ???
equal: 21386 milis
equal: 7181 milis
equal: 2710 milis ???

for same size strings but last char different I got:

different: 3011 milis
different: 23415 milis
different: 6924 milis
different: 2791 milis

for different sizes, almost same strings but one char added at the end for s2:

different size: 3167 milis
different size: 5188 milis
different size: 6902 milis
different size: 2951 milis

seems to me it is best to use first a string.length() comparison before equals().

But this will not matter almost at all because this is the case where I have 10^10 string comparisons with 40 chars length and what is bizarre for me is the case where for equal strings I have a better speed when I compare the string length first.

Jed Fox
  • 2,979
  • 5
  • 28
  • 38
ungalcrys
  • 5,242
  • 2
  • 39
  • 23
  • 7
    I think there's something wrong with your data. When you're comparing Strings of the same length, how could algorithm 4 (comparing length, then using .equals()) be faster than algorithm 1 (comparing only using .equals()). For these cases algorithm 4 is doing an unnecessary String length comparison that will always return true. – Tyler Oct 12 '15 at 23:40
4

If you can show that it's a significant bottleneck, which would surprise me, you could try

s1.hashCode() == s2.hashCode() && s1.equals(s2)

It might be a bit faster. It mightn't.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • This was my first idea, too. Since string are immutual (is this really spelled right?) you're basically comparing constant ints here, which should be fast. Could be a problem only if the objects are equal most of the time, you could dynamically swap the implementation then. Too sad I don't have the jdk on this machine, would love to profile that now. – atamanroman Sep 28 '10 at 08:35
  • 1
    Yes, it is faster. But you need to do a `null` check before. – Stephan Mar 24 '12 at 09:38
  • 1
    I doubt that would be faster unless you cache hashcode in some way. I think equals is faster than computing a hash code. – jontro Apr 16 '13 at 10:23
  • both strings should calculate their hashcodes before (which happens only if the string was used as key in hashmap) also 0 hash still valid value. – Sergey Ponomarev Jul 22 '18 at 00:01
  • @jontro Hashcodes are not computed on every call for `String`. As `String` is invariant, they can be precomputed, or cached, internally, and indeed they are. – user207421 Aug 19 '23 at 04:11
3

It depends what you need. I think equals() is really optimized but perhaps you need something else faster than equals(). Take a look at this post.

oyo
  • 1,906
  • 2
  • 16
  • 22
1

Simple answer

String.equals(Object)

I'm pretty sure (this answer has some references) and it's very likely that the JIT will have an intrinsic for String#equals, which means it would be able to replace the call with specially crafted machine code for the architecture your JVM is currently running on.

Community
  • 1
  • 1
Flow
  • 23,572
  • 15
  • 99
  • 156
0

As always, you will need to benchmark for your application / environment. And unless you have already profiled and idetified this as a performance bottleneck, it's probably not going to matter ("premature optimization is the root of all evil").

Having said that:

a.equals(b) is really fast for Strings. It's probably one of the most tightly optimized pieces of code in the Java platform. I'd be very surprised if you can find any faster way of comparing two arbitrary strings.

There are special cases where you can cheat and use (a==b) safely, e.g. if you know that both Strings are interned (and therefore value identity implies object identity). In that case it may be slightly faster than a.equals(b) - but again this depends on compiler/JVM implementation. And it's very easy to shoot yourself in the foot if you don't know what you are doing.....

mikera
  • 105,238
  • 25
  • 256
  • 415
  • p.s. I have just micro-benchmarked this, and (a==b) does beat a.equals(b) by a factor of about 2-4 times (30ns vs. 70-110ns) in my environment (Eclipse on Sun Java 1.6). YMMV, and usual caveats about micro-benchmarking do of course apply :-) – mikera Sep 27 '10 at 16:39
  • Looking at the implementation code posted by @BalusC, I can see absolutely no heavy optimizations, nothing at all to warrant your statement. Granted, optimizing this already trivial code isn’t easy. But low-level, what *could* have been done to optimize is to move from char-wise to int-wise comparison (obviously this requires low-level tricks not readily available in Java, and it might not be faster after all). – Konrad Rudolph Sep 27 '10 at 17:00
  • Hmmm It looks extremely tightly optimized to me, to the extent that e.g. they are re-using the string length as a negative loop counter (which is a classic low-level optimization). I can't see personally see any additional optimization that could be made, short of abandoning pure Java and dropping to a specialized native implementation (and it's possible that the JIT does this anyway....) – mikera Sep 27 '10 at 18:11
  • You can use `equals()` and intern strings safely. The `equals()` method does check for identity. – Stephan Mar 24 '12 at 09:16