0

Given 4 billion numbers, how to find a number which is not in these 4 billion numbers? We have only 1GB of memory.

The numbers can be non-consecutive.

How to do the same in 10MB of memory?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Bhushan
  • 18,329
  • 31
  • 104
  • 137
  • 1
    Where are these 4 bilion numbers given? a DB? a file? an array? network? where – AbiusX Mar 15 '11 at 02:35
  • Are the numbers sequential? It's not too difficult if the numbers are sequential, a little more so if they're not. – rsbarro Mar 15 '11 at 02:36
  • 2
    I think the intent of the problem is the given numbers are randomly distributed within the range, eg. we have 900 numbers in the range 1 to 1000. How do we solve this most efficiently with 100 numbers worth of storage? and how do you solve it with just a few numbers worth of storage. Just taking a max() may not produce an answer, as 1000 may be in the set. – ndp Mar 15 '11 at 02:39
  • 1
    @Abius: i guess we can assume that those are on disk, since we have constraint of RAM – Bhushan Mar 15 '11 at 02:45
  • @rsbarro: no, numbers are definitely not sequential, or else its trivial – Bhushan Mar 15 '11 at 02:46
  • If you mean that you have a number between 0 and 2^32-1, then you simply make a BitVector (or equivalent) and put at 1 the bits for the number you have. The vector is 512mb. – xanatos Mar 15 '11 at 18:41

6 Answers6

2

Assuming that this is a routine you are just going to run once, use the amount of memory available as a limiting factor. Load the numbers into an array, up to the amount of memory you have available. Sort the array using your favorite sorting algorithm. Do a binary search to see if the value is present. If it's there, you're done, if not, then clear the array and start loading numbers from the file in the last place you left off. Repeat the process until you find a match or you reach the end of the file.

For example, if you have 1 GB to work with and the numbers are 4 bytes large (say, C# int), set the upper array bound at something like 1024 ^ 3 / 4 = 268435456 * i (where i is some value < 1 to make sure we leave a little memory left over for other processes). Fill the array, sort, check, repeat.

If you only have 10 MB to work with, set the upper array bound to 1024 ^ 2 * 10 / 4 = 10485760 * i.

And really, now that I think about it, since the sort has to touch every value anyway, you're better off just scanning the list and leaving out the sort. The sort would be useful though if you wanted to save the list in ordered sets (up to the size of your array) for later processing. In that case you would also want to store the size of the arrays so you could rely on the fact they were sorted for each consecutive run.

rsbarro
  • 27,021
  • 9
  • 71
  • 75
0

Well, with the assumption that I can choose a number which is larger than the largest of the 4 billion numbers:

set i = 0
for each number:
    load the number into memory
    set i = max(i, number + 1)
Justin
  • 84,773
  • 49
  • 224
  • 367
  • but if the 4 billion numbers already has the MAX_VALUE and MIN_VALUE included, then we need to find the number lesser than MAX_VALUE and greater than MIN_VALUE, since it should fit in the range of an integer. they are asking for 'integer', what do you think? – Bhushan Mar 15 '11 at 02:35
0

Find the maximum of the set of numbers (O(N) in time, constant space) and the number not in the set is max+1.

That doesn't strike me as a very challenging question. Finding the smallest natural number not in the set might be a better question.

tgdavies
  • 10,307
  • 4
  • 35
  • 40
  • I think he means there could be 17, 4, 6, 10, 15 in the set, and how do you determine if 12 is not in the set. – rsbarro Mar 15 '11 at 02:35
0

If the question is "There are 4 billion random numbers in a list, now choose a new random number that is not in the list." then I would Merge sort the list to put it in order O(n lg n). Then walk down the list comparing the current element to the next one in order O(n). Something like...

MergeSort(thelist);

for(int i = 0; i < thelist.Length; i++)
{
  if(thelist[i] + 1 == thelist[i+1])
  {
     //it's just a duplicate element
     continue;
  }
  else if (thelist[i] + 1 != thelist[i+1])
  {
     Console.WriteLine("the number is {0}", thelist[i] + 1);
     break;
  }
jb.
  • 9,921
  • 12
  • 54
  • 90
-1

Considering the broadness of the OP question, One could venture that there are 4 billion random numbers in there and another random number is chosen and need to check wheter or not that new number is already in the list.

If that's the case, a simple Binary Search (considering that the numbers are ordered) should suffice with a a comparison time of O(logN).

Paulo Santos
  • 11,285
  • 4
  • 39
  • 65
  • 1
    I read the question as more of 4 billion random numbers in a list, choose another random number that is not in the list. But you may be right. – jb. Mar 15 '11 at 02:42
  • @jb if that's the case, it's even easier if the number are ordered, just search from the first to last checking the differences between numbers. Of course this works because we all are assuming that we're dealing with NATURAL numbers. If we venture into the REAL numbers, then things starts to get real nasty, as between 1 and 2 there are infinite numbers. – Paulo Santos Mar 15 '11 at 02:59
  • 1
    Requirements are memory not time. Space complexity in worst case is O(n) as in bad malloc. – nate c Mar 15 '11 at 03:00
  • @nate, either way, if we're going to discuss memory allocation, nothing was said about the size of the numbers... is it byte? Int16? Int32? Int64? Nowadays, thanks to managed languages and O.S. we don't need to worry too much about memory allocation because the pagination is done for us. However, the method remains. Searching 4 billion or 10 the algorithim is the same. The only thing that changes is how the vector is loaded in memory. Do we need to allocate it ourselves or let the O.S. do it for us? – Paulo Santos Mar 15 '11 at 03:10
  • @nate, the O.P. was too vage a question to answer precisely. What's the size of the data? Is the processor 8, 16, 32 or 64 bits? Is the pagination done for us? Where are the number stored in the first place? There are a lot of questions to be answered before placing constraint only in memory. – Paulo Santos Mar 15 '11 at 03:13
  • Sigh ... Obviously (too some) the whole point of the question would be to realize that algorithms have different space complexity than their time complexity. I bet to even note that would be acceptable without being sure of which one to choose. To note the big O running time and not the memory requirements of it would be to totally miss the point. (By the way tell me how bytes are in a gigabyte. Now how many sets of 4 billion numbers can you put in that one gigabyte. ;-) – nate c Mar 15 '11 at 03:18
  • hehe... :-P Considering 1KB = 1024B and 1MB = 1024KB... 1GB = 1,073,741,824B. Considering a 32bits unsigned integer we can fit 268,435,456 number in a gigabyte. So to fit 4 billion numbers in memory alltogether we'd need 15GB. Or page the list 15 times. – Paulo Santos Mar 15 '11 at 03:26
-1
  • Goto http://en.wikipedia.org/wiki/Sorting_algorithm.
  • Pick an algorithm where space complexity is 1. Maybe pick an obscure one so you can seem like a real aficionado of algorithms. Or pick the popular one that every has heard of if you want to seem like team player.
  • Memorize the pseudo-code and pretend that you have every sorting algorithm memorized as this what you do for fun to show off in the interview.
  • Profit.
nate c
  • 8,802
  • 2
  • 27
  • 28