3

Sort at most 10 million 7-digit numbers. constraints: 1M RAM, high speed. several secs is good.

[Edit: from comment by questioner: the input values are distinct]

Using Bitmap Data Structure is a good solution for this problem.

That means I need a string, which length is at most 10 Million???? So is the RAM enough for this? confused here. Thank you

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
Josh Morrison
  • 7,488
  • 25
  • 67
  • 86

2 Answers2

10

So, there are ~8,000,000 bits in 1MB but if you have arbitrary 7 digit numbers (up to 9,999,999) using a bit vector to do the sort won't work. Similarly, it won't work if some numbers can be repeated because you can only store {0,1} in a bit vector.

But assuming, (what I think your problem is asking) that you have a sequence of integers between 0 and 8,000,000 with no duplicates, you can simply allocate a zeroed array of 8,000,000 bits and then for each number, tick off the corresponding bit in the array. Then outputting the sorted list is simply reading through that array in sequence and outputting the index for each 1 value.

If you are asking the more complex version of the question (0 - 10 million, repeats allowed), then you will need to to sort chunks that fit in ram, store them on disk, and then you can merge these chunks in linear time and streaming (so you don't ever have to store the whole thing in memory). Here is an implementation of a very similar thing in python: http://neopythonic.blogspot.com/2008/10/sorting-million-32-bit-integers-in-2mb.html

Jesse Cohen
  • 4,010
  • 22
  • 25
  • 1
    The solution in Python you've linked takes `~25` seconds to sort `~10**7` integers https://gist.github.com/73d83699c9a15db37f22 – jfs Feb 21 '11 at 00:42
3

Start with a bit array representing the lowest 8 million possible values. Read through the input and set a bit for every value within range. Then output all the numbers for the turned-on bits in sequence. Next Clear the first 2 million bits of the array so that it can represent the highest 2 million possible values. Read through the input and set a bit for every value in the new range. Output all the values in this range. Done.

BlueMonkMN
  • 25,079
  • 9
  • 80
  • 146
  • 1
    Or do two chunks of 5 million, and you'll have some change left from your 1MB RAM restriction. Maybe even enough to store the executable code, stack, etc, so that the program genuinely does run in 1MB RAM :-) – Steve Jessop Feb 21 '11 at 01:44
  • BTW, optimized special case, if you know you have 10 million unique 7-digit non-negative integers, just print all the integers from 0 to 9,999,999 in order without reading the input :) – BlueMonkMN Dec 07 '17 at 20:43