0

I wrote the C++ code below to find the highest missing positive integer in a vector. However, it doesn't work if there are duplicate values in the vector. Could anybody suggest an efficient method to remove duplicates without sorting the vector as the logic depends on the order of the vector being maintained.

  • expected worst-case time complexity is O(N);

  • expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).

    FirstMissingPositive(std::vector<int> a) {
    
      //size
      int N = a.size();
    
      // a dummy value to replace
      // integers below 0 and above N
      int FLAG = N + 2;
    
      for (int i = 0; i < N; ++i) {
        if (a[i] <= 0 || a[i] > N) {
          a[i] = FLAG;
        }
      }
    
      // Formula loop
      for (int i = 0; i < N; ++i) {
        if (a[i] == FLAG || a[i] == -FLAG) {
          continue;
        }
    
        int value = abs(a[i]);
        a[value - 1] = -abs(a[value -1]);
      }
    
      return N + 1;
    }
    
Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
arcoxia tom
  • 1,531
  • 4
  • 14
  • 25
  • 2
    What type of efficiency? Space, Time, Complexity? If you don't care about space efficiency, use a set. Iterate through the vector, if the element is in the set, it's a duplicate. If it's not, add it to the set. – Alan Apr 16 '18 at 17:37
  • 1
    If you don't mind allocating additional memory you can keep a second container (either a sorted vector or a set) with the unique values. Check if each value has been seen already, and if not insert it. – Jonathan Wakely Apr 16 '18 at 17:37
  • 2
    @Alan No need to check the set first, just add all elements from the vector to the set. If the set behaves as `std::set` (or `std::unordered_set` since ordering doesn't seem to be important) then attempting to add a second copy will fail gracefully. – Some programmer dude Apr 16 '18 at 17:41
  • 3
    How does your algorithm even work? You are simply returning the size of the vector + 1. What's the logic in that? – smac89 Apr 16 '18 at 17:59
  • 1
    why do you use `abs` several times when you already made sure that all numbers are in `[0,N)` ? – 463035818_is_not_an_ai Apr 16 '18 at 18:34
  • 1
    what is the " highest missing positive integer" ? there is no highest positive integer... can you give an example? – 463035818_is_not_an_ai Apr 16 '18 at 18:36
  • i really tried to understand your code, but to me it seems like you are just randomly swapping the signs in the input vector and then return its size +1 – 463035818_is_not_an_ai Apr 16 '18 at 18:49

3 Answers3

3

You can remove duplicates in place though this approach has efficiency of O( n^2 ).

Here is a demonstrative program.

#include <iostream>
#include <algorithm>
#include <vector>

template <typename ForwardIterator>
ForwardIterator remove_duplicates( ForwardIterator first, ForwardIterator last )
{
    auto new_last = first;

    for ( auto current = first; current != last; ++current )
    {
        if ( std::find( first, new_last, *current ) == new_last )
        {
            if ( new_last != current ) *new_last = *current;
            ++new_last;
        }
    }

    return new_last;
}

int main() 
{
    std::vector<int> v = { 1, 3, 2, 1, 4, 3 };

    for ( int x : v ) std::cout << x << ' ';
    std::cout << std::endl;

    v.erase( remove_duplicates( v.begin(), v.end() ), v.end() );

    for ( int x : v ) std::cout << x << ' ';
    std::cout << std::endl;

    return 0;
}

Its output is

1 3 2 1 4 3 
1 3 2 4
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
1

You can use additional set to mark unique values:

#include <iostream>
#include <vector>
#include <set>

int main()
{
    std::vector<int> v = { 1, 3, 2, 1, 4, 3 };

    for ( int x : v ) std::cout << x << ' ';
    std::cout << std::endl;

    std::set<int> s;

    for (auto iter = v.begin(); iter != v.end(); ) {
        if (s.find(*iter) == s.end()) {
            s.insert(*iter);
            iter++;
        }
        else {
            iter = v.erase(iter);
        }
    }

    for ( int x : v ) std::cout << x << ' ';
    std::cout << std::endl;

    return 0;
}
-2

Use a Bitset (assuming you know a maximum possible integer)

Start with it initialized to zero. For each integer, check if the corresponding place is already set, if it is you have a duplicate, if not set it.

Bitset has constant time lookup and is the most space efficient storage

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263