WARNING: The reputation on this answer was added before everyone who had read the thread realised they had no clue what they were talking about (including me). If there is an accepted answer at some point in the future which doesn't disagree with all documentation on this algorithm, pay more attention to that!
In Short:
The 2*(whatever) bit is easy to explain and I have done so below. the (whatever) bit has me absolutely baffled and as far as I can tell it should be
2*(count1)
and have absolutely nothing to do with the length of the second range. Either I've missed something (most likely), or the documentation is wrong (wouldn't that be nice...).
If you want my reasoning, see below. If I'm wrong and someone has the right answer, let me know so that I can delete this post!
Why it's 2 * (count1 etc.)
If you're asking why it's 2 times (count1 + count2) -1, The key is in this line:
The elements are compared using operator< for the first version, and comp for the second. Two elements, a and b are considered equivalent if (!(a < b) && !(b < a)) or if (!comp(a,b) && !comp(b,a)).
Every time it compares two elements, it does two comparisons: am I not equal to it and is it not equal to me.
The MAXIMUM amount of these "comparison pairs" it has to do is an awful lot harder to pin down. In fact, I can't figure it out. As far as I can tell, it should have nothing to do with the length of the 2nd range...
Why it's not (count1 + count2 -1) or (count1 + count2) -1
I've looked at this for days and done some tests based on the code samples in cppreference and I honestly don't know how this can be true.
The algorithms imply and a source dug up by the OP says that range2 must be a subset of range1, and both are sorted, so there is no necessity to check an element twice. That means that the algorithms must check at most all the elements of range1, plus one extra check if an element of range2 is larger than any element in range1. Not only that, but it doesn't matter where there are 2 or 20 elements in range2, it still does exactly the same amount of comparisons.
There are two potential definitions of comparison, which will obviously give different answers:
comparison == All comparison operations in the algorithm
In this case, the following comparisons occur:
- Have I reached the last range2 element
- Have I reached the last range1 element
- Is range2 elem < range1 elem
- Is range2 elem > range1 elem
In the simple case N1 == N2 == 1, this could generate at least 6 comparisons (1,2,3,4,1,2: where range1 = {1} and range2 = {10}, for example), which is far more than either algorithm allows. So this can't be the case.
comparison == checking elem1 and elem2 are equal
In this case, there are two comparisons for each element of range1 until it has found all the elements of range2 or it gets to the end of range1 where it stops (upon finding that it has reached the end of range1).
Thus, as far as I can tell, the complexity is independent of the length of N2, and the complexity should be
2*(N1)
Note that for this definition of "comparison", 2*(N1 + N2 - 1) would only appear to hold for N2 == 1, and 2*(N1 + N2) -1 never holds as the number of comparisons is only odd in a non-maximal complexity case (range2 has a number which is not in range1 AND not greater than max(range1)).
Any other definition of comparison would be selective. The only other thing I can think of is that the compiler optimises out certain steps, like not checking whether it's reached the end of range2 again when the element isn't incremented (which would also make the algorithm dependent on N2 as required), but I can't see anything else that it could optimise out in order to get the number down to something that matches either stated complexity entirely.
...anyone else got a better answer? I'm now as curious as the OP is about this.