3

This question has been asked before but I cannot find it for C++.

If I have a vector and I have a starting number, does std::algorithm provide me a way to find the next highest missing number?

I can obviously write this in a nested loop, I just cant shake the feeling that I'm reinventing the wheel.

For example, given: vector foo{13,8,3,6,10,1,7,0};

The starting number 0 should find 2.
The starting number 6 should find 9.
The starting number -2 should find -1.

EDIT:

Thus far all the solutions require sorting. This may in fact be required, but a temporary sorted vector would have to be created to accommodate this, as foo must remain unchanged.

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Came here in hope for an elegant STL solution that just uses one or two lines. But it seems this does not exist - sad :-) – dhaumann Jan 20 '17 at 09:03
  • @dhaumann Yeah I think mines the simplest thing here and I still felt like it had to be rolled into a function. If your vector can be sorted you can definitely simplify. If you ask the question for a sorted vector and link it here I'll come provide a 2 line STL solution. – Jonathan Mee Jan 20 '17 at 12:34
  • @dhaumann Actually cancel that, you can just sort and then look at the accepted answer here: http://stackoverflow.com/q/27861373/2642059 – Jonathan Mee Jan 20 '17 at 20:25

4 Answers4

6

At least as far as I know, there's no standard algorithm that directly implements exactly what you're asking for.

If you wanted to do it with something like O(N log N) complexity, you could start by sorting the input. Then use std::upper_bound to find the (last instance of) the number you've asked for (if present). From there, you'd find a number that differs from the previous by more than one. From there you'd scan for a difference greater than 1 between the consecutive numbers in the collection.

One way to do this in real code would be something like this:

#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <iterator>

int find_missing(std::vector<int> x, int number) {
    std::sort(x.begin(), x.end());
    auto pos = std::upper_bound(x.begin(), x.end(), number);

    if (*pos - number > 1)
        return number + 1;
    else {
        std::vector<int> diffs;
        std::adjacent_difference(pos, x.end(), std::back_inserter(diffs));
        auto pos2 = std::find_if(diffs.begin() + 1, diffs.end(), [](int x) { return x > 1; });
        return *(pos + (pos2 - diffs.begin() - 1)) + 1;
    }
}

int main() {
    std::vector<int> x{ 13, 8, 3, 6, 10, 1,7, 0};

    std::cout << find_missing(x, 0) << "\n";
    std::cout << find_missing(x, 6) << "\n";
}

This is somewhat less than what you'd normally think of as optimal to provide the external appearance of a vector that can/does remain un-sorted (and unmodified in any way). I've done that by creating a copy of the vector, and sorting the copy inside the find_missing function. Thus, the original vector remains unmodified. The disadvantage is obvious: if the vector is large, copying it can/will be expensive. Furthermore, this ends up sorting the vector for every query instead of sorting once, then carrying out as many queries as desired on it.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • `std::upper_bound` is interesting. But I don't think the comparison function can be tweaked enough to make it work on an unsorted container. What I'd really like is something that would work without sorting, cause I can't sort the `vector`. – Jonathan Mee Jan 27 '15 at 18:44
  • 2
    I had no idea `adjacent_difference` existed – bolov Jan 27 '15 at 19:07
  • @bolov: One of my (many) missions in life is to help make the algorithms hidden in `` a little more visible. :-) – Jerry Coffin Jan 27 '15 at 19:11
  • I will try to understand this some time in the near future :)). Meanwhile I compiled and ran some test cases: and I found it gives wrong output for numbers not in the container: e.g. `(elem:result) (-2:-1 , 5:9 , 20:21)` – bolov Jan 27 '15 at 19:13
  • @bolov: At least as I read the question, those look like correct answers. – Jerry Coffin Jan 27 '15 at 19:17
  • @bolov I didn't really specify what should happen if the range was to be inclusive or exclusive. I intended it to be exclusive. So this is the correct result. – Jonathan Mee Jan 27 '15 at 19:31
4

So I thought I'd post an answer. I don't know anything in std::algorithm that accomplishes this directly, but in combination with vector<bool> you can do this in O(2N).

template <typename T>
T find_missing(const vector<T>& v, T elem){
    vector<bool> range(v.size());
    elem++;

    for_each(v.begin(), v.end(), [&](const T& i){if((i >= elem && i - elem < range.size())range[i - elem] = true;});

    auto result = distance(range.begin(), find(range.begin(), range.end(), false));

    return result + elem;
}
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
3
  • First you need to sort the vector. Use std::sort for that.

  • std::lower_bound finds the first element that is greater or equal with a given element. (the elements have to be at least partially ordered)

  • From there you iterate while you have consecutive elements.

  • Dealing with duplicates: One way is the way I went: consider consecutive and equal elements when iterating. Another approach is to add a prerequisite that the vector / range contains unique elements. I chose the former because it avoids erasing elements.

Here is how you eliminate duplicates from a sorted vector:

v.erase(std::unique(v.begin(), v.end()), v.end());

My implementation:

// finds the first missing element in the vector v
// prerequisite: v must be sorted
auto firstMissing(std::vector<int> const &v, int elem) -> int {
  auto low = std::lower_bound(std::begin(v), std::end(v), elem);

  if (low == std::end(v) || *low != elem) {
    return elem;
  }

  while (low + 1 != std::end(v) &&
         (*low == *(low + 1) || *low + 1 == *(low + 1))) {
    ++low;
  }
  return *low + 1;
}

And a generalized version:

// finds the first missing element in the range [first, last)
// prerequisite: the range must be sorted
template <class It, class T = decltype(*std::declval<It>())>
auto firstMissing(It first, It last, T elem) -> T {
  auto low = std::lower_bound(first, last, elem);

  if (low == last || *low != elem) {
    return elem;
  }

  while (std::next(low) != last &&
         (*low == *std::next(low) || *low + 1 == *std::next(low))) {
    std::advance(low, 1);
  }
  return *low + 1;
}

Test case:

int main() {
  auto v = std::vector<int>{13, 8, 3, 6, 10, 1, 7, 7, 7, 0};    
  std::sort(v.begin(), v.end());

  for (auto n : {-2, 0, 5, 6, 20}) {
    cout << n << ": " << firstMissing(v, n) << endl;
  }

  return 0;
}

Result:

-2: -2  
0: 2  
5: 5  
6: 9  
20: 20  

A note about sorting: From the OP's comments he was searching for a solution that wouldn't modify the vector.

You have to sort the vector for an efficient solution. If modifying the vector is not an option you could create a copy and work on it.

If you are hell-bent on not sorting, there is a brute force solution (very very inefficient - O(n^2)):

auto max = std::max_element(std::begin(v), std::end(v));
if (elem > *max) {
  return elem;
}
auto i = elem;
while (std::find(std::begin(v), std::end(v), i) != std::end(v)) {
  ++i;
}
return i;
bolov
  • 72,283
  • 15
  • 145
  • 224
  • Seems to me as if `upper_bound` makes more sense here than `lower_bound`. If there were multiple instances of the requested number, you'd want the last one, not the first. – Jerry Coffin Jan 27 '15 at 17:43
  • @JerryCoffin I thought about that, but with upper_bound you can't tell if your element is in range. If it isn't then the element is the first missing. – bolov Jan 27 '15 at 17:46
  • Seems like `if (*result == input_number)` is a pretty easy way to figure out whether the returned iterator refers to the input number of a higher number. Also note that in case the number was not present, `lower_bound` and `upper_bound` will both return precisely the same result (an iterator to next larger item than the one requested). – Jerry Coffin Jan 27 '15 at 17:57
  • @JerryCoffin yes. But if there are duplicates in the container, the alg will **fail miserably** regardless of lower_bound of upper_bound. (will find the first duplicate as the first missing) Will edit. – bolov Jan 27 '15 at 17:59
  • @bolov Thanks for the solution! From all the answers it seems the only way to handle this is to sort the `vector`, which saddens me, but it is what it is. – Jonathan Mee Jan 27 '15 at 18:48
  • @bolov I have been trying to leave the `vector` unsorted cause of how much re-coding I'd have to do. If I decide to sort it, I believe I would be better off maintaining a sorted `vector` and using this solution: http://stackoverflow.com/q/27861373/2642059 – Jonathan Mee Jan 27 '15 at 19:07
1

First solution:

Sort the vector. Find the starting number and see what number is next. This will take O(NlogN) where N is the size of vector.

Second solution:

If the range of numbers is small e.g. (0,M) you can create boolean vector of size M. For each number of initial vector make the boolean of that index true. Later you can see next missing number by checking the boolean vector. This will take O(N) time and O(M) auxiliary memory.

Ashot
  • 10,807
  • 14
  • 66
  • 117
  • I think the OP is looking for something more specifically idiomatic-C++-ish, rather than a general description of the basic algorithms that can be used. (But, +1. Not everything benefits from STL-ification, even when possible.) – ruakh Jan 27 '15 at 17:37
  • By the way, your second approach can be tweaked to work even when *M* is not small, by ignoring values outside the range [*x* + 1, *x* + *N*]. – ruakh Jan 27 '15 at 17:42
  • Yeh, and the auxiliary memory will be O(N) insted of O(M) – Ashot Jan 27 '15 at 17:46
  • @Ashot My problem is that I cannot sort the vector, otherwise I'd just maintain a sorted vector and do this: http://stackoverflow.com/q/27861373/2642059 My current solution has been to maintain a `vector`. And that might be as good as it gets :( – Jonathan Mee Jan 27 '15 at 18:37