6

I was asked this questions in an interview. Consider the scenario of punched cards, where each punched card has 64 bit pattern. I was suggested each card as an int since each int is a collection of bits.

Also, to be considered that I have an array which already contains 1000 such cards. I have to generate a new element everytime which is different from the previous 1000 cards. The integers(aka cards) in the array are not necessarily sorted.

Even more, how would that be possible the question was for C++, where does the 64 bit int comes from and how can I generate this new card from the array where the element to be generated is different from all the elements already present in the array?

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
Cipher
  • 5,894
  • 22
  • 76
  • 112
  • Maybe I misunderstood. Must every newly generated number be different from *all* the previous ones, or do you delete the first element every time? In that case, an O(1000) = O(1) solution can be formulated trivially. – Fred Foo Sep 14 '11 at 09:20
  • @larsmans: The newly generated integer was asked to be different from all the previous ones. – Cipher Sep 14 '11 at 10:24
  • 1
    Related Question: [Find an integer not among four billion given ones](http://stackoverflow.com/questions/7153659/find-an-integer-not-among-four-billion-given-ones) – Christian Ammer Sep 14 '11 at 12:07

13 Answers13

5

There are 264 64 bit integers, a number that is so much larger than 1000 that the simplest solution would be to just generate a random 64 bit number, and then verify that it isn't in the table of already generated numbers. (The probability that it is is infinitesimal, but you might as well be sure.)

Since most random number generators do not generate 64 bit values, you are left with either writing your own, or (much simpler), combining the values, say by generating 8 random bytes, and memcpying them into a uint64_t.

As for verifying that the number isn't already present, std::find is just fine for one or two new numbers; if you have to do a lot of lookups, sorting the table and using a binary search would be worthwhile. Or some sort of a hash table.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 1
    But inserting a new element in a sorted table takes O(n) time, just like `std::find`, so you're stuck with O(n) for generate in both cases. – Fred Foo Sep 14 '11 at 09:17
  • 1
    @larsmans `But O(n)` with a very small constant factor and good locality often beats `O(lg n)` with a large constant factor and bad locality. – James Kanze Sep 14 '11 at 09:51
  • 1
    @James 1000 isn't really that small over a large number of operations. lg 1000 is 10, so you'd have to have quite a large constant before O(n) can compete with O(lg N) – flight Sep 14 '11 at 09:57
  • 1
    @quasiverse Remember that the O(n) only applies to the insertion, not the look-up. Inserting into a `map` requires allocation, which can make a really, really large difference with regards to the constant factor. (And of course, if performance really becomes an issue, significantly better solutions are available. But they are more complicated, and thus more prone to error.) – James Kanze Sep 14 '11 at 12:30
3

I may be missing something, but most of the other answers appear to me as overly complicated. Just sort the original array and then start counting from zero: if the current count is in the array skip it, otherwise you have your next number. This algorithm is O(n), where n is the number of newly generated numbers: both sorting the array and skipping existing numbers are constants. Here's an example:

#include <algorithm>
#include <iostream>
unsigned array[] = { 98, 1, 24, 66, 20, 70, 6, 33, 5, 41 };

unsigned count = 0;
unsigned index = 0;

int main() {
  std::sort(array, array + 10);
  while ( count < 100 ) {
    if ( count > array[index] )
      ++index;
    else {
      if ( count < array[index] )
        std::cout << count << std::endl;
      ++count;
    }
  }
}
Nicola Musatti
  • 17,834
  • 2
  • 46
  • 55
  • overly complicated? Maybe. Faster for the first several cards? Definitely. – Mooing Duck Sep 14 '11 at 23:18
  • 1
    @Mooing Duck: Certainly, but then why code to requirements that are not there? The OP speaks of "new number everytime...", there are no explicit performance requirements nor an indication of how many numbers will be extracted. In practice even just comparing each 64 bit unsigned number with all the elements in the array is likely to be fast enough for most purposes. – Nicola Musatti Sep 15 '11 at 07:22
  • I code to my interpretation of what they want more than what they say, since what they say is never quite what they want. I had a homework assignment once where we had to write a function that returned true if a string parameter was greater than 5 characters. An answer that follows specs: `bool IsGreatFive(const char*) {return true;}` – Mooing Duck Sep 15 '11 at 16:12
2

Here's an O(n) algorithm:

int64 generateNewValue(list_of_cards)
{
    return find_max(list_of_cards)+1;
}

Note: As @amit points out below, this will fail if INT64_MAX is already in the list.

As far as I'm aware, this is the only way you're going to get O(n). If you want to deal with that (fairly important) edge case, then you're going to have to do some kind of proper sort or search, which will take you to O(n log n).

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • O(n^2) worst case, pretty slow indeed :| – amit Sep 14 '11 at 08:18
  • You could also keep track of the max/min values then you know for sure that if you choose say max+1 it's new, then update this as max. It'll be just as slow first time round but after it's O(1) though it requires that there is actually sufficient space at the top and bottom of the range. – Dan Sep 14 '11 at 08:20
  • @Oli: true. But he does mention it is interview question, I'd like to think every interview question is actually implicitly saying "how fastest can .... be done? [and how?]" – amit Sep 14 '11 at 08:21
  • @Oli: I surely thought of this but didn't give this answer since looping over 1000 elements again and again would surely not be a good answer to be expected. – Cipher Sep 14 '11 at 08:21
  • @Cipher: Ok, see updated answer. I've taken it from O(n^2) to guaranteed O(n). Obviously, if you need to generate more than one new element, then it drops to O(1). But assuming no state, O(n) is the best you can do. – Oliver Charlesworth Sep 14 '11 at 08:23
  • 1
    this [updated solution] will fail if max==MAX_INT_64 and MIN_INT_64 is also in the list. – amit Sep 14 '11 at 08:24
  • it will also fail if used more than once, you need to add it to the list otherwise it will return the same value all the time – Cristian Toader Sep 14 '11 at 08:28
  • @Chris: This requirement was to *generate* a new element. Appending to the list can be achieved trivially as: `list_of_cards.append(generateNewValue(list_of_cards))`. – Oliver Charlesworth Sep 14 '11 at 08:29
  • @Oli: if the function keeps track of a `maximum`, and finds the maximum value used less than `maximum-1`, then it doesn't fail if MAX_INT_64 is in the list. It's much more complicated, but can then eventually return all the numbers. – Mooing Duck Sep 14 '11 at 23:26
2

@arne is almost there. What you need is a self-balancing interval tree, which can be built in O(n lg n) time.

Then take the top node, which will store some interval [i, j]. By the properties of an interval tree, both i-1 and j+1 are valid candidates for a new key, unless i = UINT64_MIN or j = UINT64_MAX. If both are true, then you've stored 2^64 elements and you can't possibly generate a new element. Store the new element, which takes O(lg n) worst-case time.

I.e.: init takes O(n lg n), generate takes O(lg n). Both are worst-case figures. The greatest thing about this approach is that the top node will keep "growing" (storing larger intervals) and merging with its successor or predecessor, so the tree will actually shrink in terms of memory use and eventually the time per operation decays to O(1). You also won't waste any numbers, so you can keep generating until you've got 2^64 of them.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Nice! This should be way faster than my tree, especially when the number of exisiting element grows. +1 – arne Sep 14 '11 at 09:21
  • You should be able to get better performance from a sorted std::deque of ranges instead of a tree. Slightly more complicated, but better locality (faster), less memory (faster), no balancing (faster), and generation after that is O(1). But the "range" concept is an awesome idea. – Mooing Duck Sep 14 '11 at 23:22
  • I just realized that the range concept also makes it trivial to return _ANY_ random unused number, not increasing/decreasing like others. (They wouldn't be evenly distributed, and it would greatly increase memory usage until you generated a large number though) – Mooing Duck Sep 14 '11 at 23:31
  • Forgot about std::deque's O(n) insert, so nevermind that, unless you're ok with only generating numbers at the beginning and end like everyone else. – Mooing Duck Sep 14 '11 at 23:36
2

This algorithm has O(N lg N) initialisation, O(1) query and O(N) memory usage. I assume you have some integer type which I will refer to as int64 and that it can represent the integers [0, int64_max].

  1. Sort the numbers
  2. Create a linked list containing intervals [u, v]
  3. Insert [1, first number - 1]
  4. For each of the remaining numbers, insert [prev number + 1, current number - 1]
  5. Insert [last number + 1, int64_max]

You now have a list representing the numbers which are not used. You can simply iterate over them to generate new numbers.

flight
  • 7,162
  • 4
  • 24
  • 31
  • "Sort the numbers" means O(lg n) temporary space at the very least (or O(n²) time). A linked list of intervals takes O(n) space. But indeed, the generating operation will take O(1) time. – Fred Foo Sep 14 '11 at 11:10
  • @larsmans No, because N is fixed at 1000... but I did say O(N lg N) time for sorting. Perhaps I have a double standard... I'll edit it then. – flight Sep 14 '11 at 11:13
0

I think the way to go is to use some kind of hashing. So you store your cards in some buckets based on lets say on MOD operation. Until you create some sort of indexing you are stucked with looping over the whole array.

IF you have a look on HashSet implementation in java you might get a clue.

Edit: I assume you wanted them to be random numbers, if you don't mind sequence MAX+1 below is good solution :)

Jan Zyka
  • 17,460
  • 16
  • 70
  • 118
  • And how would the OP find a non-existing element in a hash table? – Fred Foo Sep 14 '11 at 08:35
  • You generate a number, get the result from hashing function and to determine if the card is already there you need to search single bucket rather than whole array – Jan Zyka Sep 14 '11 at 08:40
  • I find it hard to estimate the expected time of this operation, but the worst case for lookup is already O(n) (when all integers happen to map to the same hash value). – Fred Foo Sep 14 '11 at 08:47
  • Thats how hash functions work, right :) In the worst case you end up with O(n) but this won't be typical if your cards are random numbers. You can argue that in the worst case you never find new card because your random number generatr will return always '5' and since there is already '5' you are stucked. Note that I assume (as mentioned in my answer) he should pick new RANDOM card which is not in the list yet - so the task is really: "I have random number. Is this already in my set?" – Jan Zyka Sep 14 '11 at 09:03
  • This approach goes from expected O(1) to expected O(n) and takes Theta(n) memory as the table grows. Mine goes from worst-case O(lg n) to expected O(1), time *and* memory. Yours might be faster in some situations, though, so there's a tradeoff per application. – Fred Foo Sep 14 '11 at 09:31
0

You could build a binary tree of the already existing elements and traverse it until you find a node whose depth is not 64 and which has less than two child nodes. You can then construct a "missing" child node and have a new element. The should be fairly quick, in the order of about O(n) if I'm not mistaken.

arne
  • 4,514
  • 1
  • 28
  • 47
  • Building a binary tree of n elements takes O(n lg n) time. Also, a node having <2 child nodes doesn't indicate a gap in the range of integers. – Fred Foo Sep 14 '11 at 08:27
  • Building a balanced binary tree is not O(n). – Oliver Charlesworth Sep 14 '11 at 08:28
  • Just wanted to suggest that. Just because the numbers aren't sorted when he gets them, doesn't mean he can't sort it himself. – RedX Sep 14 '11 at 08:31
  • How do you construct a new child node, whilst guaranteeing that it's unique? – Oliver Charlesworth Sep 14 '11 at 08:32
  • It's a binary tree, using the bits of the integer as node values. Hence, if we find a node a that has depth of say 63 and only one child (with, say, value 0), we can create the new child by appending a 1 to the bit representation of the path to node a. If we had already encountered the number generated in this way, the leaf would already be present. This contradicts the initial assumption that the leaf is not present. – arne Sep 14 '11 at 08:35
0
bool seen[1001] = { false };
for each element of the original array
    if the element is in the range 0..1000
        seen[element] = true
find the index for the first false value in seen
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Yeah - that's quicker than my solution which is similar, but used an array of integers. You could tweak the range when the boolean array is 'used up' so that more values could be added. Resetting your boolean array will be quicker, (memset) than my looping refill with incrementing values. Also, my design has, as listed, a fatal flaw in that since I use the first value in the list as an 'invalid' flag, it must not be included in the invalidation scan. Still, we both hit on the idea of just starting at 0 and working up.. – Martin James Sep 14 '11 at 12:29
0

Initialization: Don't sort the list. Create a new array 1000 long containing 0..999. Iterate the list and, if any number is in the range 0..999, invalidate it in the new array by replacing the value in the new array with the value of the first item in the list.

Insertion: Use an incrementing index to the new array. If the value in the new array at this index is not the value of the first element in the list, add it to the list, else check the value from the next position in the new array. When the new array is used up, refill it using 1000..1999 and invalidating existing values as above. Yes, this is looping over the list, but it doesn't have to be done for each insertion.

Near O(1) until the list gets so large that occasionally iterating it for invalidation of the 'new' new array becomes significant. Maybe you could mitigate this by using a new array that grows, maybee always the size of the list?

Rgds, Martin

Martin James
  • 24,453
  • 3
  • 36
  • 60
0

Put them all into a hash table of size > 1000, and find the empty cell (this is the parking problem). Generate a key for that. This will of course work better for bigger table size. The table needs only 1-bit entries.

EDIT: this is the pigeonhole principle. This needs "modulo tablesize" (or some other "semi-invertible" function) for a hash function.

unsigned hashtab[1001] = {0,};
unsigned long long long long numbers[1000] = { ... };

void init (void)
{
unsigned idx;

for (idx=0; idx < 1000; idx++) {
    hashtab [ numbers[idx] % 1001 ] += 1; }

}

unsigned long long long long generate(void)
{
unsigned idx;

for (idx = 0; idx < 1001; idx++) {
    if ( !hashtab [ idx] ) break;  }

return idx + rand() * 1001;
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109
0

Based on the solution here: question on array and number

Since there are 1000 numbers, if we consider their remainders with 1001, at least one remainder will be missing. We can pick that as our missing number.

So we maintain an array of counts: C[1001], which will maintain the number of integers with remainder r (upon dividing by 1001) in C[r].

We also maintain a set of numbers for which C[j] is 0 (say using a linked list).

When we move the window over, we decrement the count of the first element (say remainder i), i.e. decrement C[i]. If C[i] becomes zero we add i to the set of numbers. We update the C array with the new number we add.

If we need one number, we just pick a random element from the set of j for which C[j] is 0.

This is O(1) for new numbers and O(n) initially.

This is similar to other solutions but not quite.

Community
  • 1
  • 1
user127.0.0.1
  • 1,337
  • 10
  • 18
0

How about something simple like this:

1) Partition the array into numbers equal and below 1000 and above

2) If all the numbers fit within the lower partition then choose 1001 (or any number greater than 1000) and we're done.

3) Otherwise we know that there must exist a number between 1 and 1000 that doesn't exist within the lower partition.

4) Create a 1000 element array of bools, or a 1000-element long bitfield, or whatnot and initialize the array to all 0's

5) For each integer in the lower partition, use its value as an index into the array/bitfield and set the corresponding bool to true (ie: do a radix sort)

6) Go over the array/bitfield and pick any unset value's index as the solution

This works in O(n) time, or since we've bounded everything by 1000, technically it's O(1), but O(n) time and space in general. There are three passes over the data, which isn't necessarily the most elegant approach, but the complexity remains O(n).

Chris Mennie
  • 564
  • 3
  • 7
-1

you can create a new array with the numbers that are not in the original array, then just pick one from this new array.

¿O(1)?

Anonimo
  • 97
  • 5
  • 1
    Yes, O(2^64) = O(1), in a trivial sense. – Fred Foo Sep 14 '11 at 09:00
  • you just need to create the array 1 time and use as many times as needed, if you delete the number each time you pick one. and there are just 1000 cards, bigs, yes, but just 1000 – Anonimo Sep 14 '11 at 09:04
  • forgot that, i think that the 1000 cards are previusly determinated. Sry – Anonimo Sep 14 '11 at 09:07