Repeated elements in a std::vector

Question

I have a std::vector and I want to check all the elements in it. If a certain element appears more than once, I signal an error.

This is how I did it:

std::vector<std::string> test;
test.push_back("YES");
test.push_back("YES");

for(int i = 0; i < test.size(); i++)
{
    if(test[i] > 1)
    {
        DCS_LOG_DEBUG("ERROR WITH COUNT")
    }
}

This did not work though I know how to count using the std::vector::count() method. But I want to get the count for each element, as opposed to counting everything... any ideas?

score 7 · Accepted Answer · answered Mar 13 '12 at 13:59

The simplest way is to std::sort the vector and then use std::adjacent_find.

However, if you don't want to sort the vector, you can do something like this in C++11:

#include <unordered_map>
#include <functional> // For std::hash<std::string>.
#include <string>
#include <iostream>

int main() {

    // Test data.
    std::vector<std::string> v;
    v.push_back("a");
    v.push_back("b");
    v.push_back("c");
    v.push_back("a");
    v.push_back("c");
    v.push_back("d");
    v.push_back("a");

    // Hash function for the hashtable.
    auto h = [](const std::string* s) {
        return std::hash<std::string>()(*s);
    };

    // Equality comparer for the hashtable.
    auto eq = [](const std::string* s1, const std::string* s2) {
        return s1->compare(*s2) == 0;
    };

    // The hashtable:
    //      Key: Pointer to element of 'v'.
    //      Value: Occurrence count.
    std::unordered_map<const std::string*, size_t, decltype(h), decltype(eq)> m(v.size(), h, eq);

    // Count occurances.
    for (auto v_i = v.cbegin(); v_i != v.cend(); ++v_i)
        ++m[&(*v_i)];

    // Print strings that occur more than once:
    for (auto m_i = m.begin(); m_i != m.end(); ++m_i)
        if (m_i->second > 1)
            std::cout << *m_i->first << ": " << m_i->second << std::endl;

    return 0;

}

This prints:

a: 3
c: 2

I didn't actually benchmark it, but this has a chance for being rather performant, for following reasons:

Assuming the actual vector elements do not produce pathologically lopsided hashes, this is actually an O(n) algorithm, as opposed to O(n*log(n)) for sorting.
We are using the hashtable of pointers to strings, not strings themselves, so there is no unnecessary copying taking place.
We can "pre-allocate" hashtable buckets (we pass v.size() when constructing m), so hashtable resizes are minimized.

Sebastian Mach · Answer 2 · 2012-03-14T07:28:08.870

Specific element

Count is the standard way to go:

#include <algorithm>
...

    if (count (test.begin(), test.end(), "YES") > 1)
        std::cerr << "positive\n";

If you need more performance, you can do it the classic way:

bool exists = false;
for (auto const& v : test) {
    if (v == "YES") {
        if (exists) {
            std::cerr << "positive\n";
            break;
        }
        else exists = true;
    }
}

Any element multiple times

For large vectors, try std::set:

std::set<std::string> exists;
for (auto const &v : test) {
    if (!exists.insert(v).second)
        std::cerr << "positive\n";
}

In this approach, if you also want to be able to recognize whether you already mentioned its non-uniqueness, you may want to use std::multiset:

const std::multiset<std::string> counts (test.begin(), test.end());
for (auto const &v: test)
    if (counts.count (v) == 2) std::cerr << "meh\n";

If the container is small, and you just want to see if any element is there more than once:

auto multitimes = [&test] (std::string const &str) {
    return count(test.begin(),test.end(),str)>1;
};
if (any_of (test.begin(), test.begin(), multitimes))
    std::cerr << "something was there more than once\n";

Don't forget to put it into the set if not already in, or just use `exists.insert(v).second` instead of `exists.find(v) != exists.end()`. — Christian Rau, Mar 13 '12 at 14:32
Doing find and then insert is surely not very efficient, since it traverses the set two times, why not just check insert's return value? — Christian Rau, Mar 13 '12 at 14:44
@ChristianRau: Thanks for the hint. This is also led me to the recommendation of multiset to detect if a message was already printed. — Sebastian Mach, Mar 13 '12 at 14:52

perreal · Answer 3 · 2012-03-13T12:43:17.060

4

You can use std::map and define a mapping from a key (string) to a count (int):

#include <map>
#include <string>
/* ... */
std::map<std::string, int> count_map;

/* ... */

count_map[key]++;

edited Mar 13 '12 at 12:43

answered Mar 13 '12 at 12:04

perreal

94,503
21
155
181

5

You can get rid of that `if` block; `count_map[key]++` will do the right thing if the key is missing. – Mike Seymour Mar 13 '12 at 12:07
1

`map` is too much for `more than once` – Sebastian Mach Mar 13 '12 at 12:40
@phresnel, if it is something I might do with perl, map is good. – perreal Mar 13 '12 at 12:48
1

@perreal: So that my batteries discharge faster and the system performance rots. – Sebastian Mach Mar 13 '12 at 13:00
1

Why don't you use a multiset? `std::multiset s(test.begin(), test.end());` Then getting the number of occurances of a string is simply a count call: s.count("key"); – Ivaylo Strandjev Mar 13 '12 at 14:44

score 2 · Answer 4 · answered Mar 13 '12 at 11:59

2

use std::count to count elements: http://www.cplusplus.com/reference/algorithm/count/

http://en.cppreference.com/w/cpp/algorithm/count

answered Mar 13 '12 at 11:59

Stefan Birladeanu

294
2
10

Who upvoted an answer that doesn't address the actual question? – Christian Rau Mar 13 '12 at 14:41
@Christian Rau I didn't saw the part with the std:count in the description when I answered it.. Hmm, was it there at that moment? – Stefan Birladeanu Mar 13 '12 at 14:49
Don't know, maybe not, but the actual question has always been the same and never been solvable by just aplying `std::count` alone. – Christian Rau Mar 13 '12 at 15:02
@Christian Rau if you use the Std:count in the if check for each item, it does – Stefan Birladeanu Mar 13 '12 at 15:13

score 2 · Answer 5 · edited Mar 13 '12 at 14:42

2

The easiest way to do what you want is to sort the array and then see which elements are met more than once. If you do not want to modify the array itself, you will have to create a copy. This is an O(n * lg n) solution with no extra space if you don't care about the order, and with O(n) extra space if you do.

sort(test.begin(), test.end());

// If you only care if there is a repeated element, do this:
int size = test.size();
unique(test.begin(), test.end());
if (test.size() != size) {
  cout << "An element is repeated.";
}

// If you do care which elements are repeated, do this:
for (unsigned index = 1; index < test.size(); ++index) {
  if (test[index] == test[index - 1] && (index == 1 || test[index - 2] != test[index])) {
     cout << test[index] << " is repeated.";
  }
}

I have provided two solutions: The first is when you only care if a string is repeated, and the second is when you care exactly which strings are repeated.

edited Mar 13 '12 at 14:42

OmarOthman

1,718
2
19
36

answered Mar 13 '12 at 12:00

Ivaylo Strandjev

69,226
18
123
176

2

Or, he can use `std::adjacent_find` in the second case. – Branko Dimitrijevic Mar 13 '12 at 12:03
if you start your index at index = 1 and you check test[index-2] wouldn't this be an array out of bounds? I think you need to start your for loop at index = 2....? – intrigued_66 Mar 13 '12 at 12:48
No it is no problem because of my if condition: `index == 1 || test[index - 2 ] != test[index]` if index is 1 the second statement will never be verified – Ivaylo Strandjev Mar 13 '12 at 13:16
ok will this word if i have multiple strings aswel i.e 5 ? @izomorphius – CodersSC Mar 13 '12 at 13:36
The second version I have provided will print out each repeating string only once(even if met 5 times in input). All the repeating strings will be printed. – Ivaylo Strandjev Mar 13 '12 at 13:55
1

@BrankoDimitrijevic In the first case, too, which prevents copying (or moving) things around. – Christian Rau Mar 13 '12 at 14:28
In fact std::adjacent_find can be used in the first case and is really good idea while in the second case I don't think it is that simple to avoid counting repeating elements more then once. The code will be approximately that hard in the second case I believe... – Ivaylo Strandjev Mar 13 '12 at 14:31
ok yes this works fine and then with what is outputted i can put in a container because i then have to check if any of those occurances equal another set of strings because only a certain amount of strings should appear once and only once @izomorphius – CodersSC Mar 13 '12 at 14:47
I am sorry but I do not actually understand what do you need to do with the strings. You can substitute the print with pushing in a vector, though – Ivaylo Strandjev Mar 13 '12 at 14:51

score 2 · Answer 6 · edited May 23 '17 at 12:15

If you don't mind extra space, try pushing the elements into a map. Whenever you find your element already in the map, you can signal the error directly.

map<string, int> occurrences;

for (vector<string>::const_iterator cit = test.begin(); cit != test.end(); ++cit)
    if ((++occurrences[*cit]) == 2)
        cout << "ERROR"; // You can even signal which element is repeated here easily, using *cit.

Note that this code correctly issues the message only once per repeated item (even if the item is repeated many times), as per the clever amendment by Tony Delroy. Though this way correctly counts the occurrence of every string in the entire collection (which may be something required), this way is subject to overflowing an int if there are 2³¹ copies of the same element (or more). You can use a long long int instead if this is the case and you really want the count of every string.

If you're not interested in the count of every string, an even more efficient way is using a set, as smerlin suggests (because it maintains the string only, not a pair of string and int as the map does), thus reducing space requirements... and issuing the error message whenever you find the item in the set:

set<string> occurrences;

for (vector<string>::const_iterator cit = test.begin(); cit != test.end(); ++cit)
    if (false == occurrences.insert(*cit).second)
        cout << "ERROR"; // You can even signal which element is repeated here easily, using *cit.

If you want to eliminate the problem before it happens, insert the elements into a set instead. It automatically removes duplicates. But take care that elements in a set are sorted, so you'll not preserve the insertion order. If you don't mind it, a set is much better, since searching into it and reading the elements out in sorted order are much more efficient.

+1 for showing complete, clean, concise code; `... == 1` would avoid multiple error messages for the same value - question isn't clear which behaviour's desired. — Tony Delroy, Mar 13 '12 at 12:36
use a `set` instead, requires less memory. he does not need the actual count, he only needs to know if the count is greater than 1 or not. EDIT: hmm yeah `set` would make sense, but the required error handling could would have to go to the location where the elements are inserted into the set, so if thats not possible, your solution is fine. — smerlin, Mar 13 '12 at 12:38
Thanks @TonyDelroy. I think you mean == 2. Anyway, he said that he'll signal a general error. But your comment is completely correct and much more general. I've updated my answer. — OmarOthman, Mar 13 '12 at 12:39

score 1 · Answer 7 · 2012-03-14T04:04:12.467

1

One solution could be using two for loops.... i think it will be simple..

For example:

std::vector<std::string> test;
test.push_back("YES");
test.push_back("YES");

for(int i = 0; i < test.size(); i++)
{
    for(int j = 0; j < test.size(); j++)
    {
         if(i != j)
         {
              if(test[i] == test[j])
              {
                   DCS_LOG_DEBUG("ERROR WITH COUNT")
              }
         }
    }
}

edited Mar 14 '12 at 04:04

answered Mar 13 '12 at 12:11

This code will always produce an error, since test[0] == test[0] is always true. I've tried to edit your post without making the comment, but for an unknown reason, the edit was rejected. j should start from i + 1. – OmarOthman Mar 13 '12 at 15:17
Great, but the point is that starting j from i + 1 is better because you'll always do every comparison twice. For example, (test[1] == test[3]) will be done when (i == 1 && j == 3) and repeated when (i == 3 && j == 1). Besides - even after fixing that - your code issues the error message multiple times for items repeated more than twice, like in { "A", "B", "B", "B" }. – OmarOthman Mar 14 '12 at 07:11

Repeated elements in a std::vector

7 Answers7

Specific element

Any element multiple times

Linked