Counting Duplicates in C++ - multiset?

Question

UPD:-
Value Instances
2 3
3 2
5 1

I want to limit the count to 1 for all the instances present in the multiset.

#include<bits/stdc++.h>
using namespace std;
int main() {
    multiset<int> p1;
    
    p1.insert(5);
    p1.insert(2);
    p1.insert(3);
    p1.insert(3);
    p1.insert(2);
    p1.insert(2);
    
    for(auto itr : p1) {
        
        if(p1.count(itr) > 1)
        p1.erase(itr);

        cout << itr;
    }
    
}

How to fix this ?

You cannot combine a range loop with `erase()` (AFAIK). `erase()` invalidates the current iterator (but returns a new one). So, this would work: `for (multiset::iterator iter = p1.begin(); iter != p1.end();) { iter = p1.erase(iter); }` (Or just do what the answer suggested: `p1.clear();`) — Scheff's Cat, Jun 13 '21 at 15:38
FYI: [Why should I not #include ?](https://stackoverflow.com/q/31816095/7478597) — Scheff's Cat, Jun 13 '21 at 15:39
Suppose I have 7 instances of 2 and 3 instances of 5 in the multiset. I want to put an if condition i.e. if I encounter more than 1 instance of any number then I will limit it to only 1 instance of that number. — Kevin Martin, Jun 14 '21 at 12:13
In that case, you should use a `std::set` because that is actually what matches your requirement. You could use also a `std::map` to map the key to the number of occurrences if you like. — Scheff's Cat, Jun 14 '21 at 12:21
Please, don't update a pending question to a new topic. That makes existing answers non-matching. Instead, open a new question for a new question. — Scheff's Cat, Jun 14 '21 at 12:22
Got it. Yeah, using maps will give a better solution. Thanks, @Scheffs-cat — Kevin Martin, Jun 14 '21 at 12:42
Does this answer your question? [counting duplicates in c++](https://stackoverflow.com/questions/39676779/counting-duplicates-in-c) — Scheff's Cat, Jul 05 '21 at 09:11

Scheff's Cat · Answer 1 · 2021-07-05T08:33:06.343

My comment:

In that case, you should use a std::set<int> because that is actually what matches your requirement. You could use also a std::map<int, int> to map the key to the number of occurrences if you like.

OPs reply:

Can you add this to a full-fledged answer so that I can accept it for this question?

Here we go:

Just filtering duplicates:

#include <iostream>
#include <set>

int main()
{
  int sample[] = { 5, 2, 3, 3, 2, 2 };
  // add all values at most once
  using Table = std::set<int>;
  Table table;
  for (int value : sample) table.insert(value);
  // output the result
  for (const Table::value_type& entry : table) {
    std::cout << "Value " << entry << "\n";
  }
}

Output:

Value 2
Value 3
Value 5

Demo on coliru

Counting the number of occurrences:

#include <iostream>
#include <map>

int main()
{
  int sample[] = { 5, 2, 3, 3, 2, 2 };
  // add all values at most once but count the number of occurrences
  using Table = std::map<int, unsigned>;
  Table table;
  for (int value : sample) ++table[value];
  // output the result
  for (const Table::value_type& entry : table) {
    std::cout << "Value " << entry.first << " (" << entry.second << " times)\n";
  }
}

Output:

Value 2 (3 times)
Value 3 (2 times)
Value 5 (1 times)

Demo on coliru

The trick:

The std::map::operator[] inserts an element if the key is not yet there. This element (in this case std::pair<const int, unsigned>) is default initialized which grants that it starts as { key, 0 }.

So, there are two cases:

The key is not yet there:
The element is created as { key, 0 } and the value (.second of the element) is incremented immediately which results in { key, 1 }.
The key is already there:
The value (.second of the element) is incremented again.

A variation on filtering duplicates:

This keeps the original input order but removes repetitions (by book-keeping in a separate std::set).

#include <iostream>
#include <set>
#include <vector>

int main()
{
  using Sample = std::vector<int>;
  Sample sample = { 5, 2, 3, 3, 2, 2 };
  // remove duplicates
  using Table = std::set<int>;
  Table table;
  Sample::iterator iterRead = sample.begin();
  Sample::iterator iterWrite = sample.begin();
  for (; iterRead != sample.end(); ++iterRead) {
    if (table.insert(*iterRead).second) *iterWrite++ = *iterRead;
  }
  sample.erase(iterWrite, sample.end());
  // output the result
  for (const Sample::value_type& entry : sample) {
    std::cout << "Value " << entry << "\n";
  }
}

Output:

Value 5
Value 2
Value 3

Demo on coliru

The trick:

std::set::insert() returns a pair of iterator and bool.
The iterator points to the key in the set (inserted or already been there). The bool denotes if the key was inserted (true) or was already there (false).

The other trick:

Just erasing every found duplicate from the std::vector would result in the worse complexity O(n²).

Hence, two iterators are used, one for reading and one for writing. Thereby, every input value which is not yet in the bookkeeping table (and hence occurs the first time) is written back, otherwise not. So, every value which occurred the first time is shifted towards the beginning and appended to the previous values which occurred the first time each. Additionally, the iterWrite points past the last written element after the loop and can be used to erase the rest (which contains left input values which are all duplicates).

The complexity of this algorithm is O(n) – much better than the naive approach.

Btw. the standard algorithms std::remove(), std::remove_if() does it the same way.

Thus, the same algorithm could be achieved with std::remove_if():

#include <algorithm>
#include <iostream>
#include <set>
#include <vector>

int main()
{
  using Sample = std::vector<int>;
  Sample sample = { 5, 2, 3, 3, 2, 2 };
  // remove duplicates
  using Table = std::set<int>;
  Table table;
  Sample::iterator last
    = std::remove_if(sample.begin(), sample.end(),
      [&](int value) { return !table.insert(value).second; });
  sample.erase(last, sample.end());
  // output the result
  for (const Sample::value_type& entry : sample) {
    std::cout << "Value " << entry << "\n";
  }
}

Output:

like above

Demo on coliru

Thanks for the answer. This will help other people too. – Kevin Martin Jul 05 '21 at 08:40 — Kevin Martin, Jul 05 '21 at 08:40
You removed the check-mark? Accident or intention? – Scheff's Cat Jul 05 '21 at 08:44 — Scheff's Cat, Jul 05 '21 at 08:44

score 0 · Answer 2 · answered Jul 05 '21 at 12:32

#include <iostream>
#include <set>

using namespace std;

int main()
{
    multiset<int> p1; 

    p1.insert(5);
    p1.insert(2);
    p1.insert(3);
    p1.insert(4);
    p1.insert(2);
    p1.insert(2);

    for (auto iter = p1.begin(); iter != p1.end();)
    {
        p1.count(*iter) > 1 ? iter = p1.erase(iter) : iter++;
    }

    for (auto & iter : p1)
    {
        cout << iter << ", ";
    }
    
    return 0;
}

Counting Duplicates in C++ - multiset?

2 Answers2