6

I have a list of 100 random integers. Each random integer has a value from 0 to 99. Duplicates are allowed, so the list could be something like

56, 1, 1, 1, 1, 0, 2, 6, 99...

I need to find the smallest integer (>= 0) is that is not contained in the list.

My initial solution is this:

vector<int> integerList(100); //list of random integers
...
vector<bool> listedIntegers(101, false);
for (int theInt : integerList)
{
    listedIntegers[theInt] = true;
}
int smallestInt;
for (int j = 0; j < 101; j++)
{
    if (!listedIntegers[j])
    {
        smallestInt = j;
        break;
    }
}

But that requires a secondary array for book-keeping and a second (potentially full) list iteration. I need to perform this task millions of times (the actual application is in a greedy graph coloring algorithm, where I need to find the smallest unused color value with a vertex adjacency list), so I'm wondering if there's a clever way to get the same result without so much overhead?

TrebledJ
  • 8,713
  • 7
  • 26
  • 48
Tyson
  • 1,226
  • 1
  • 10
  • 33
  • Is there any typo? `O(2 * n)` is the same as `O(n)`. – 273K Nov 25 '18 at 11:13
  • Oops, you're correct. – Tyson Nov 25 '18 at 11:15
  • You have to implement heap with priority modified and optimized for your specific task. – 273K Nov 25 '18 at 11:53
  • 1
    Since you want this task performed a large amount of times, you may be able to take advantage of previous iterations (so you may want to specify how that works). But within an isolated iteration, I'm afraid you won't find anything better than your algorithm, especially with such a small list (if you always only have 100 integers). Well, apart from not using `vector`. – Nelfeal Nov 25 '18 at 11:54
  • @S.M. Isn't that nlog(n)? – kabanus Nov 25 '18 at 11:56
  • 4
    If there are only 100 integers, determining the fastest approach is going to come down to benchmarking and micro-optimisations. From a theoretical asymptotic point of view, the code you have is the fastest way to solve the problem. – Bernhard Barker Nov 25 '18 at 12:08
  • One of your vectors is larger than necessary, also fix those magic numbers. Talking about that, is there any relation between the 100 elements in the vector and the 100 possible values its values can take? Also, calling the elements "random" is just to make up an example, or are these elements actually random? If that's the case, you'd have another possibility. – Ulrich Eckhardt Nov 25 '18 at 13:05
  • @kabanus: Yes, but with `n = 100`, making that `O(1)`. ;) Just as Dukeling wrote, asymptotic complexity doesn't apply here. Each such statement starts with a the usually omitted and sometimes forgotten "For every n greater than n_0, the following applies: ..." – Ulrich Eckhardt Nov 25 '18 at 17:39
  • Your problem might be ill-posed or I am misunderstanding what you are doing. The smallest missing integer should be smaller than the minimal value V of the input sequence and be the minimal value of the equal range between 0 and the V. – Hello Everyone Nov 25 '18 at 17:46
  • @UlrichEckhardt Then we can sort in N^2 (on purpose) and find the minimum in O(1)! :). I was thinking the heap is even less useful here for that reason exactly - amortized values would be way off mark. – kabanus Nov 25 '18 at 18:16

6 Answers6

2

It's been a year, but ...

One idea that comes to mind is to keep track of the interval(s) of unused values as you iterate the list. To allow efficient lookup, you could keep intervals as tuples in a binary search tree, for example.

So, using your sample data:

56, 1, 1, 1, 1, 0, 2, 6, 99...

You would initially have the unused interval [0..99], and then, as each input value is processed:

56: [0..55][57..99]
1: [0..0][2..55][57..99]
1: no change
1: no change
1: no change
0: [2..55][57..99]
2: [3..55][57..99]
6: [3..5][7..55][57..99]
99: [3..5][7..55][57..98]

Result (lowest value in lowest remaining interval): 3

1

I believe there is no faster way to do it. What you can do in your case is to reuse vector<bool>, you need to have just one such vector per thread.

Though the better approach might be to reconsider the whole algorithm to eliminate this step at all. Maybe you can update least unused color on every step of the algorithm?

Yola
  • 18,496
  • 11
  • 65
  • 106
0

Since you have to scan the whole list no matter what, the algorithm you have is already pretty good. The only improvement I can suggest without measuring (that will surely speed things up) is to get rid of your vector<bool>, and replace it with a stack-allocated array of 4 32-bit integers or 2 64-bit integers.

Then you won't have to pay the cost of allocating an array on the heap every time, and you can get the first unused number (the position of the first 0 bit) much faster. To find the word that contains the first 0 bit, you only need to find the first one that isn't the maximum value, and there are bit twiddling hacks you can use to get the first 0 bit in that word very quickly.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • I thought about simply flagging bits in unsigned ints, but wasn't sure how to find the first 0 bit without iterating those ints as well. Is there a faster way? – Tyson Nov 25 '18 at 19:41
  • 1
    @Tyson you have to iterate through the ints, but there's only 4 of those!. Then, to find the first 0 bit in `x`, you just have to find the first 1 bit in `~x` using, for example, one of the many methods here: https://stackoverflow.com/questions/757059/position-of-least-significant-bit-that-is-set – Matt Timmermans Nov 25 '18 at 21:41
0

You program is already very efficient, in O(n). Only marginal gain can be found. One possibility is to divide the number of possible values in blocks of size block, and to register not in an array of bool but in an array of int, in this case memorizing the value modulo block.
In practice, we replace a loop of size N by a loop of size N/block plus a loop of size block.
Theoretically, we could select block = sqrt(N) = 12 in order to minimize the quantity N/block + block.
In the program hereafter, block of size 8 are selected, assuming that dividing integers by 8 and calculating values modulo 8 should be fast.
However, it is clear that a gain, if any, can be obtained only for a minimum value rather large!

constexpr int N = 100;
int find_min1 (const std::vector<int> &IntegerList) {
    constexpr int Size = 13;    //N / block
    constexpr int block = 8;
    constexpr int Vmax = 255;   // 2^block - 1

    int listedBlocks[Size] = {0};
    for (int theInt : IntegerList) {
        listedBlocks[theInt / block] |= 1 << (theInt % block);
    }
    for (int j = 0; j < Size; j++) {
        if (listedBlocks[j] == Vmax) continue;
        int &k = listedBlocks[j];
        for (int b = 0; b < block; b++) {
            if ((k%2) == 0) return block * j + b;
            k /= 2;
        }
    }
    return -1;
}
Damien
  • 4,809
  • 4
  • 15
  • 20
0

Potentially you can reduce the last step to O(1) by using some bit manipulation, in your case __int128, set the corresponding bits in loop one and call something like __builtin_clz or use the appropriate bit hack

Surt
  • 15,501
  • 3
  • 23
  • 39
-1

The best solution I could find for finding smallest integer from a set is https://codereview.stackexchange.com/a/179042/31480

Here are c++ version.

int solution(std::vector<int>& A)
{
    for (std::vector<int>::size_type i = 0; i != A.size(); i++) 
    {
        while (0 < A[i] && A[i] - 1 < A.size()
            && A[i] != i + 1
            && A[i] != A[A[i] - 1]) 
        {
            int j = A[i] - 1;

            auto tmp = A[i];
            A[i] = A[j];
            A[j] = tmp;
        }
    }

    for (std::vector<int>::size_type i = 0; i != A.size(); i++)
    {
        if (A[i] != i+1)
        {
            return i + 1;
        }
    }
    return A.size() + 1;
}
Syaiful Nizam Yahya
  • 4,196
  • 11
  • 51
  • 71