3

I have a noob javascript question. Let's say we have two very large strings (~ million characters or more) that are equal - they have the same length and the same content. Let's say we have these two function that both do the same thing (compare strings):

function equals1(a, b) {
    return a === b;
}

function equals2(a, b) {
    if (a.length !== b.length) {
           return false;
    }
    for (var i = 0; i < a.length; ++i) {
        if (a[i] !== b[i]) {
            return false;
         }
    }
    return true;
}

Why is the first function (equals1()) almost as twice as fast as the second one? How can the second one be improved so that it performs as good as the first one?

Michail Michailidis
  • 11,792
  • 6
  • 63
  • 106
  • 8
    ... Because it is implemented in native code? Why do you want to reimplement string comparison? – John Dvorak Oct 23 '14 at 16:17
  • 1
    The most simple explanation is to just look at the code. The first function has a single line of code. The second has more than one line of code. Why would the second be any faster than the first? Why would you want it to be? – Heretic Monkey Oct 23 '14 at 16:18
  • 2
    @joel apart from not being exactly accurate - why do you think it would be faster? – John Dvorak Oct 23 '14 at 16:20
  • try using a data structure like suffix tree for comparison – QuakeCore Oct 23 '14 at 16:21
  • 3
    I swear, people will upvote any question that has the word "performance" in the title. –  Oct 23 '14 at 16:21
  • Note about the [(strict) comparison algorithm](http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.6): *"If `Type(x)` is `String`, then return `true` if `x` and `y` are exactly the same sequence of characters (same length and same characters in corresponding positions); otherwise, return `false`."* . So the algorithm already checks the length of the strings. – Felix Kling Oct 23 '14 at 16:32
  • 1
    I do not want to reimplement String comparison :). This is just a hypothetical question... – Tihomir Meščić Oct 23 '14 at 16:32
  • Please check my answer http://stackoverflow.com/a/26532637/986160 - I don't know why people are so mean and downvote a good or best answer.. :( Thanks – Michail Michailidis Oct 23 '14 at 18:11
  • 1
    If you assign the string values in quotes and you don't build them incrementally then they are interned and as the JsPerf shows (see my answer) the time is constant as I was insisting http://jsperf.com/eqaulity-is-constant-time I am curious to see your benchmark. Thanks – Michail Michailidis Oct 24 '14 at 18:29
  • I gave a concise answer to my question (for further reference): http://stackoverflow.com/a/26554820/986160 – Michail Michailidis Oct 24 '14 at 19:25

4 Answers4

6

Most probably Javascript is doing string interning (Do common JavaScript implementations use string interning?) according to a person that is in ECMAScript committee. I thought that then === would be O(1) but based on performance test of the original poster it is O(n) as doubling the string doubles the time for the equality.. That is really sad that string interning is not used they way it should be.

Update: JSPerf

The orginal poster claims should be backed up for O(N) complexity From http://jsperf.com/eqaulity-is-constant-time It seems that even if I have 16x bigger string the time doesn't change more than 1-2%

So please reconsider those things that I have striked-through and your down votes

In other words:

when you do

var str1 = "stringwithmillionchars"; //stored in address 51242
var str2 = "stringwithmillionchars"; //stored in address 12313

the "stringwithmillionchars" will be stored once let's say in address 201012 of memory and both str1 and str2 will be "pointing in this address 201012. This address could be determined with some kind of hashing to map to specific locations in memory.

So when doing

"stringwithmillionchars"==="stringwithmillionchars"

would look like

getContentOfAddress(51242)===getContentOfAddress(12313)

or 201012 === 201012

which takes O(1)/constant time

The for loop in your example (equals2()) has O(N) time, where N the length of both strings. That is because it has to do N comparisons between each pair of characters and N comparisons between i and str.length.

Note: the address numbers were chosen randomly for illustration purposes..

Important: Based on performance comparisons from my question(Why Javascript ===/== string equality sometimes has constant time complexity and some times has linear time complexity) interning happens only when the strings are assigned directly with quotes otherwise the comparison will take linear time instead of constant because char-to-char comparison happens.

Community
  • 1
  • 1
Michail Michailidis
  • 11,792
  • 6
  • 63
  • 106
  • Why do you think that strings are hashed? Or compared using only internal references? Of course, an engine might do some optimisations if you have `a==a`, but if two string literals or differently created strings are compared that needs to happen (at least once) by value. – Bergi Oct 23 '14 at 17:27
  • 1
    @Bergi http://stackoverflow.com/questions/5276915/do-common-javascript-implementations-use-string-interning If you are familiar with String interning then what it needs to do is compare the memory addresses of the two strings - if it is the same then the two strings are the same – Michail Michailidis Oct 23 '14 at 18:06
  • Yes, a bit. But quoting from there: "*constant strings are interned automatically, results of string concatenation, etc aren't*". And even when constants are interned, they need to be at least once compared to the values in the pool. – Bergi Oct 24 '14 at 00:43
  • 2
    Thank for the answer but I think that this is not what happens. I tried increasing the length of Strings by the factor of 2, and the comparison (a === b) takes twice as much time (hence, it is also O (n)) – Tihomir Meščić Oct 24 '14 at 08:08
  • Thanks @TihomirMeščić - could you please send put a jsperf link in your original post with all these different equals functions and different string sizes? – Michail Michailidis Oct 24 '14 at 13:58
  • @TihomirMeščić I want to she how you are backing that there is a linear increase in time with linear increase of string length: Here is the JSPerf for 16x bigger string and all of the equalities are about the same: http://jsperf.com/string-equals-comparison-is-it-constant-time – Michail Michailidis Oct 24 '14 at 14:54
  • @TihomirMeščić check this updated comparison because the other ones will "break" the loops as they had different string length comparisons - here are pairs and I double, quad and 16x both pairs and the time is constant: http://jsperf.com/string-equals-comparison-is-it-constant-time – Michail Michailidis Oct 24 '14 at 15:12
  • 2
    I'd recommend you to leave it and don't waste more time on this. Your effort is remarkable, but I think you made your point. You're getting 100 bounty from me for your efforts. – lexicore Oct 24 '14 at 16:52
  • Thanks :) I will keep looking deeper to have a pleasing answer ;) – Michail Michailidis Oct 24 '14 at 17:01
  • 1
    Interesting. It looks like javascript will do string interning when you use string literals. However, in my test case, I generated the strings by simply concatenating them in a loop (it would be difficult to type one million characters :). http://jsperf.com/equality-no-interning – Tihomir Meščić Oct 24 '14 at 18:52
  • It is not that difficult to type - exponentially they increase if you copy paste ;) - Thanks for the JsPerf – Michail Michailidis Oct 24 '14 at 19:05
  • 1
    @TihomirMeščić "_... constant strings are interned automatically, results of string concatenation, etc aren't_" ([See comment by olliej on accepted answer](http://stackoverflow.com/questions/5276915/do-common-javascript-implementations-use-string-interning?lq=1)) – Rafael Emshoff May 27 '15 at 07:29
  • @RafaelCichocki That's what the jsperf tests showed :) - I still got all the down votes that day.. – Michail Michailidis May 27 '15 at 18:53
2

The first function is faster because it doesn't have to check if i < a.length one million times, and perform an increment operation on i one million times.

dave
  • 62,300
  • 5
  • 72
  • 93
1

You can do something like the following to make your equals2 function twice faster:

function equals2(a, b) {
    if (a.length !== b.length) {
           return false;
    }
    for (var i = 0; i < a.length/2; ++i) {
        if (a[i] !== b[i] || a[a.length-i-1] !== b[b.length-i-1]) {
            return false;
         }
    }
    return true;
}
UtkarshPramodGupta
  • 7,486
  • 7
  • 30
  • 54
0

The first does the same things. If the strings are different lengths, it will return false. Then it checks to see if the characters are the same at the same indices. However, it is implemented at the JS implementation level, so it is running as fast as C, C++, Java or whatever language the JavaScript implementation is written in.

Brennan
  • 5,632
  • 2
  • 17
  • 24
  • Yes, which is what I said here – Brennan Oct 23 '14 at 16:36
  • He wrote the code so he knows that. He is interested only in the example of two same length and content strings. The performance issue is that the one is doing 2*1000000 comparisons and the other only one . – Michail Michailidis Oct 23 '14 at 16:38
  • It has to do more than one comparison though, how do you think it does the comparison behind the scenes? – Brennan Oct 23 '14 at 16:40
  • it just compares the two integers (hashes) which is way faster than char-per-char comparison ;). Of course in the ALU of your CPU an int comparison is not a single instruction but this is offtopic and we consider it O(1) or constant time – Michail Michailidis Oct 23 '14 at 16:44