1

I had found that list::unique() removes only consecutive elements in a list. I would like to know the reason why the method functions like this.

Remove duplicates from a list explains how to remove the duplicate elements from a list but it doesn't explain why just the consecutive elements are only removed.

The following is my test code:

#include <algorithm>
#include <iostream>
#include <list>
#include <string>
#include <utility>

using namespace std;

void print_message( const pair<int,string>& message )
{
    cout << message.first << " - " << message.second << endl;
}

int main( )
{
list< pair< int, string > > message;
list< pair< int, string > >::iterator ip;

message.push_back( make_pair( 1, string("Test Foo One")) );
message.push_back( make_pair( 1, string("Test Foo Two")) );
message.push_back( make_pair( 1, string("Test Foo Two")) );
message.push_back( make_pair( 1, string("Test Foo Three" )) );
message.push_back( make_pair( 1, string("Test Foo Two" )));
message.push_back( make_pair( 1, string("Test Foo Two")) );

cout << "ORIGINAL MESSAGES" << endl;
ip = message.begin();
while(ip!=message.end()) {
        print_message(*ip);
        ip++;
}

message.unique();
cout << "\nSEQUENTIAL DUPLICATES REMOVED" << endl;
ip = message.begin();
while(ip!=message.end()) {
        print_message(*ip);
        ip++;
}

}

Why is this method list::unique() designed just to remove consecutive duplicate elements in the list?

Community
  • 1
  • 1
Praveen Vinny
  • 2,372
  • 6
  • 32
  • 40
  • I don't understand the "why" question. A sequence needs that operation as one of the basic operations, i.e. "remove adjacent equivalent elements" is a natural basic operation for a sequence. And `unique` happens to be the implementation of that basic operation in STL. – AnT stands with Russia May 18 '16 at 23:11
  • I was doubtful since the method name is `unique()` but it removes only consecutive duplicates. – Praveen Vinny May 18 '16 at 23:12
  • I.e. you don't like the *name*. You don't like the fact that the name "unique" was assigned to this specific functionality. It has historical roots, like UNIX `uniq` command, which also compares only adjacent lines in a stream. – AnT stands with Russia May 18 '16 at 23:12
  • Does sorting the list first to get all the identical elements together fix it? The same as on unix typing `cat x | sort | uniq` – Jerry Jeremiah May 18 '16 at 23:13
  • 1
    By unique, it means distinct values to the best of my knowledge and it's confusing. Is there some reason why the method `unique` is designed to remove just the consecutive duplicates? – Praveen Vinny May 18 '16 at 23:14
  • Sorting removes the ordering. – Praveen Vinny May 18 '16 at 23:15
  • @Praveen Vinny: You are going in circles again. A basic operation of "removing just the consecutive duplicates" is absolutely needed as a basic operation. It *has* to be implemented, no question about it. So, the only question here whether it should be called `unique` or something else (like `unique_adj`, `unique_seq` etc.). The library authors decided to call it `unique`. – AnT stands with Russia May 18 '16 at 23:15
  • There's a very important _why_ ... Removing sequential duplicates is far more efficient than removing scattered duplicates. It you want to remove _all_ duplicates, you can ***choose*** to sort first, making all duplicates sequential. – Disillusioned May 18 '16 at 23:15
  • @CraigYoung Your comment answers the reason from the algorithmic complexity perspective. Is there some other reason? – Praveen Vinny May 18 '16 at 23:17

2 Answers2

5

Design of the standard library follows the approach of providing the user with a comprehensive set of basic, relatively minimalistic algorithmic "building blocks", which, if necessary, can be manually combined into more complex algorithms. Removal of consecutive equivalent elements happens to such minimalistic algorithm. Removal of non-consecutive equivalent elements in a non-ordered sequence is definitely not a minimalistic algorithm. It is not a natural algorithm for a non-ordered sequence.

Typically, in order to achieve the latter in an non-ordered container the user would have to perform sort and then perform unique. Since the library already provides sort and unique as basic "building blocks", there's no need to provide a separate version of unique algorithm that would work with non-adjacent elements. In that sense sort and unique form an idiomatic pair, just like remove_if and erase in the well-known erase–remove idiom.

Also, the problem with implementing non-adjacent unique is that a "clean" implementation of such functionality would call for a non-reordering operation, i.e. repetitive elements should be removed, but the relative order of the remaining elements should remain unchanged. It is not possible to implement such functionality as a minimalistic "building block". And I'd say that there's not much need for it, since in most practical cases the user will either be happy with sort-then-unique combination or use a different container type altogether.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • That's well said @AnT – Praveen Vinny May 19 '16 at 00:09
  • This doesn't address the "why call it `unique` instead of something more explicit and longer". The answer to which would be "C++ is so far from caring about readability or coherence that after a long number of people thought and thought about the issue, they still added a keyword called [`alignas`](http://en.cppreference.com/w/cpp/language/alignas)." Try pronouncing that. – HostileFork says dont trust SE May 19 '16 at 10:03
2

I agree with your premise that the usage of the word unique is not ideal, relative to longer names perhaps as consecutive_unique. "(The weirder the operation, the longer the name should be)" I would not--myself--have thought mere unique a good name for this, using a longer name and then saving the short name for what you would intuit.

So...why? I am reminded of a hypothetical interview with Richard Feynman, regarding the question of "why are manhole covers round". I'll cite this bit:

Interviewer: Do you believe there is a safety issue? I mean, couldn't square covers fall into the hole and hurt someone?

Feynman: Not likely. Square covers are sometimes used on prefabricated vaults where the access passage is also square. The cover is larger than the passage, and sits on a ledge that supports it along the entire perimeter. The covers are usually made of solid metal and are very heavy. Let's assume a two-foot square opening and a ledge width of 1-1/2 inches. In order to get it to fall in, you would have to lift one side of the cover, then rotate it 30 degrees so that the cover would clear the ledge, and then tilt the cover up nearly 45 degrees from horizontal before the center of gravity would shift enough for it to fall in. Yes, it's possible, but very unlikely. The people authorized to open manhole covers could easily be trained to do it safely.

In this context, the why will come down to: "The people authorized to open manhole covers could easily be trained to do it safely."

Even if a C++ function has the "intuitive" name, that's no protection against its behavior. You must be cognizant of a lot of details to use a library class or function. In a way, it's good to train you to imagine every method or function is called X... so you read to see what X actually promises contractually.