why c++ algorithm includes have at most 2*(count1+count2)-1 comparisons

Question

While reading a cplusplus.com article, I noticed the it said the following:

Up to linear in twice the distances in both ranges: Performs up to 2*(count1+count2)-1 comparisons (where countX is the distance between firstX and lastX).

However, cppreference.com states:

At most 2·(N1+N2-1) comparisons, where N1 = std::distance(first1, last1) and N2 = std::distance(first2, last2).

Given that the N1 == Count1 and N2 == Count2 by definition, which site is correct, and can anyone explain how this maximum number or comparisons was calculated, namely:

What is the worst case scenario that defines this maximum complexity
can anyone explain the step in that scenario and show which of the above (cplusplus or cppreference) has the correct number of comparisons?

Also, prefer cpprefence over cplusplus, some senior members told me recently that the latter can sometimes contain bad practice or even errors. http://en.cppreference.com/w/cpp/algorithm/includes — dfrib, Dec 07 '15 at 13:46
(http://stackoverflow.com/questions/6520052/whats-wrong-with-cplusplus-com) — dfrib, Dec 09 '15 at 14:42
Can we reopen this question please? I think the question has been made sufficiently clear now and the answer is really bugging me. — Ieuan Stanley, Dec 10 '15 at 12:13

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

WARNING: The reputation on this answer was added before everyone who had read the thread realised they had no clue what they were talking about (including me). If there is an accepted answer at some point in the future which doesn't disagree with all documentation on this algorithm, pay more attention to that!

In Short:

The 2*(whatever) bit is easy to explain and I have done so below. the (whatever) bit has me absolutely baffled and as far as I can tell it should be

2*(count1)

and have absolutely nothing to do with the length of the second range. Either I've missed something (most likely), or the documentation is wrong (wouldn't that be nice...).

If you want my reasoning, see below. If I'm wrong and someone has the right answer, let me know so that I can delete this post!

Why it's 2 * (count1 etc.)

If you're asking why it's 2 times (count1 + count2) -1, The key is in this line:

The elements are compared using operator< for the first version, and comp for the second. Two elements, a and b are considered equivalent if (!(a < b) && !(b < a)) or if (!comp(a,b) && !comp(b,a)).

Every time it compares two elements, it does two comparisons: am I not equal to it and is it not equal to me.

The MAXIMUM amount of these "comparison pairs" it has to do is an awful lot harder to pin down. In fact, I can't figure it out. As far as I can tell, it should have nothing to do with the length of the 2nd range...

Why it's not (count1 + count2 -1) or (count1 + count2) -1

I've looked at this for days and done some tests based on the code samples in cppreference and I honestly don't know how this can be true.

The algorithms imply and a source dug up by the OP says that range2 must be a subset of range1, and both are sorted, so there is no necessity to check an element twice. That means that the algorithms must check at most all the elements of range1, plus one extra check if an element of range2 is larger than any element in range1. Not only that, but it doesn't matter where there are 2 or 20 elements in range2, it still does exactly the same amount of comparisons.

There are two potential definitions of comparison, which will obviously give different answers:

comparison == All comparison operations in the algorithm

In this case, the following comparisons occur:

Have I reached the last range2 element
Have I reached the last range1 element
Is range2 elem < range1 elem
Is range2 elem > range1 elem

In the simple case N1 == N2 == 1, this could generate at least 6 comparisons (1,2,3,4,1,2: where range1 = {1} and range2 = {10}, for example), which is far more than either algorithm allows. So this can't be the case.

comparison == checking elem1 and elem2 are equal

In this case, there are two comparisons for each element of range1 until it has found all the elements of range2 or it gets to the end of range1 where it stops (upon finding that it has reached the end of range1).

Thus, as far as I can tell, the complexity is independent of the length of N2, and the complexity should be

2*(N1)

Note that for this definition of "comparison", 2*(N1 + N2 - 1) would only appear to hold for N2 == 1, and 2*(N1 + N2) -1 never holds as the number of comparisons is only odd in a non-maximal complexity case (range2 has a number which is not in range1 AND not greater than max(range1)).

Any other definition of comparison would be selective. The only other thing I can think of is that the compiler optimises out certain steps, like not checking whether it's reached the end of range2 again when the element isn't incremented (which would also make the algorithm dependent on N2 as required), but I can't see anything else that it could optimise out in order to get the number down to something that matches either stated complexity entirely.

...anyone else got a better answer? I'm now as curious as the OP is about this.

@TuXiaomi : `std::includes` "Returns true if every element from the sorted range `[first2, last2)` is found within the sorted range `[first1, last1)`." It doesn't matter that range2 is not a subset of range1, it's only checking whether each element exists in the other range. — Ieuan Stanley, Dec 09 '15 at 13:54
for each element in [searchBeg,serchEnd), there must be an equal element in [beg, end).If elements in [searchBeg,searchEnd)are equal,[beg,end）must contain the same number of elements.Thus,[searchBeg, searchEnd)must be a subset of [beg,end) Nicolai M. Josuttis write in page 609,in The C++ Standard Library,2nd edition,3rd print. I write a program in vs2015, and it prove that range2 must be subset of range1, when return :be included. — Tu Xiaomi, Dec 09 '15 at 14:29
and cpprefernce.com said the complexity is 2·(N1+N2-1), rathe than 2·(N1+N2-)1 in cpluscplus.com — Tu Xiaomi, Dec 09 '15 at 14:34
@TuXiaomi, looks like you're absolutely right - I was misled by the wording of the definition. I'll study the implementations referenced at that link and modify my answer. — Ieuan Stanley, Dec 09 '15 at 14:49
Stackoverflow, @IStanley thank you for your reply,I am still puzzled with the problem, so I am waiting for your answer! — Tu Xiaomi, Dec 09 '15 at 14:56

score 0 · Answer 2 · answered Dec 07 '15 at 13:53

0

First of all, it has to do 2 comparisons to check for equivalence which is defined as:

 if (!(a<b) && !(b<a))

or

if (!comp(a,b) && !comp(b,a))

then it has to check the whole of each possible range against each other minus 1.

answered Dec 07 '15 at 13:53

Paul Evans

27,315
3
37
54

score 0 · Answer 3 · answered Dec 07 '15 at 13:57

The key is in this line

Two elements, a and b are considered equivalent if !(a < b) && !(b < a) or if !comp(a,b) && !comp(b,a)

Let us take for instance the less-than predicate < and the sorted sets

[2,3]
[1,2,3,4]

the element 2 and 1 are compared: 2 < 1 returns false, so they can be equivalent elements or 2 can be greater than 1. If they are equivalent, 1 < 2 will also return false, but it does not. Therefore 1 is discarded for the second set and the value 2 of the second set is considered next.

Maximum comparisons: 2*(M+N)-1. The algorithm runs in O(N+M).