0

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.

Example: List of integers: 1;9;5;7;2;5. N: 2 after duplicates are removed, the list becomes 1;9;5;7;2. So, the answer is 7, because 7 is the 2-nd largest element in the modified list.

In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion

type Integer_array is Array (Natural range <>) of Integer;

 procedure FindN-thLargestNumber (A : in out Integer_Array) is
 b : Integer;
 c:Integer;
 begin

//sorting of array
    for I in 0 to length(A) loop
       for J in 1 to length(A) loop
          if A[I] > A[J] then
             A[I] = A[I] + A[J];
             A[J] = A[I] - A[J];
             A[I] = A[I] - A[J];
          end if;
       end loop;
    end loop;

//remove duplicates
    for K in 1 to length(A) loop
       IF A[b] != A[K] then
       b++;
       A[b]=A[K];
     end loop;
    c = ENTER TO FIND N-th Largest number

    PRINT A[b-(c-1)] ;
 end FindN-th Largest Number 
Robin Green
  • 32,079
  • 16
  • 104
  • 187
dev_marshell08
  • 1,051
  • 3
  • 18
  • 40
  • possible duplicate of [Optimal bubble sorting algorithm for an array of arrays of numbers](http://stackoverflow.com/questions/6560140/optimal-bubble-sorting-algorithm-for-an-array-of-arrays-of-numbers) –  Jan 05 '14 at 16:02
  • 1
    Avoiding the temp doesn't save any memory in practice - it all ends up in registers. Bubble sort is dreadful for any purpose (O(N**2) time). A quicksort requires O(log N) extra space but is much faster. – Alan Stokes Jan 05 '14 at 16:06
  • Why was this question tagged Java? – Robin Green Jan 05 '14 at 16:35

4 Answers4

2

If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Depending on the size of N, it may be faster to quicksort it and then pick the Nth element. I.e. if N is larger than logL, where L is the length of the entire array. – vgru Jan 05 '14 at 16:08
  • 2
    @Groo You are right as a general note, but OP's first goal is *minimal memory*. Therefore the choice is bubble sort, selection sort, or insertion sort. Among these, clearly selection sort is the best option due to available short-circuiting. – Marko Topolnik Jan 05 '14 at 16:13
  • I'm pretty sure that for sufficiently large `N`, heapsorting the whole array would beat `N` selections. Of course, predicting that `N` isn't simple. – Steve Jessop Jan 05 '14 at 17:21
  • @SteveJessop This is guaranteed *minimal memory*, therefore it most literally answers OP's question. Other than that, the optimal way is certainly what e.g. Postgresql does to execute `SELECT ... ORDER BY ... LIMIT n`---and that's not a full heap sort, but a heap-backed priority queue. – Marko Topolnik Jan 05 '14 at 17:25
  • @MarkoTopolnik: well, if the questioner's "minimal memory" is taken literally then I suppose you'd have to count the variables between bubble sort and selection sort, see which has fewer. Then realise that the optimizer will surprise you anyway regarding precise stack use in any serious programming language, so the question is pointless ;-) If all that's required is constant memory use, then a truncated heapsort will beat a truncated selection sort for fairly modest values of `N`, I think. – Steve Jessop Jan 05 '14 at 17:31
  • @SteveJessop I'm not that well-acquainted with heap sort, what exactly does its "constant" additional memory comprise? – Marko Topolnik Jan 05 '14 at 17:53
  • @SteveJessop Yeah, saw that a long time ago... thanks. – Marko Topolnik Jan 05 '14 at 20:01
  • But OP is sorting the array in place. With an in place quicksort I don't see what additional space is needed? – vgru Jan 05 '14 at 20:45
  • @Groo Wikipedia: "O(log n) additional space used by the stack during the recursion". – Marko Topolnik Jan 05 '14 at 20:48
  • @Marko: crap, I didn't think of that tiny issue. :-) – vgru Jan 05 '14 at 20:53
  • @Groo Yes, it is an easy-to-miss bit... and admittedly, that's a quite tiny memory investment. – Marko Topolnik Jan 05 '14 at 21:01
2

To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.

You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):

int[n] found = new int[n];

for (int i = 0;i<found.length;i++) {
   found[i] = Integer.MIN_VALUE;
}


for (int i: list) {
   if (i > found[0]) {
       int insert = 0;

       // Find the point in the array to insert the value
       while (insert < found.length && found[insert] < i) {
           insert++;
       }

       // If not at the end we have found a larger value, so move back one before inserting
       if (found[insert] >= i) {
          insert --;
       }

       // insert the value and shuffle everything BELOW it down.
       for (int j=insert;j<=0;j--) {
            int temp = found[j];
            found[j]=i;
            i=temp;
       }
   }
}

At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.
Tim B
  • 40,716
  • 16
  • 83
  • 128
  • A heap / priority queue ought to perform better than a sorted list (depending how big N is). – Alan Stokes Jan 05 '14 at 16:52
  • For large N yes. It would have to be pretty large though to make a difference since you are only checking one entry in the array most times around the loop. Personally I'd consider a `TreeSet`, but he said minimal memory usage though and this is the best you can get for minimal memory really. – Tim B Jan 05 '14 at 16:53
  • Only I just saw the question has been de-tagged as Java so its a good thing I didn't really use any fancy features :) – Tim B Jan 05 '14 at 16:56
  • But the question *is* about *minimal memory* as the first priority. Since the theoretical minimal memory is next to zero (just one temp variable), this is way off target. – Marko Topolnik Jan 05 '14 at 16:57
  • True. This combines excellent performance with excellent memory usages. Bubble sort is the best you can get for memory if that is the only criteria but the performance will be terrible. – Tim B Jan 05 '14 at 17:01
  • This has atrocious worst cases (for example when N == list.length and/or the input data is in decreasing order). So I don't think it's really justified to say that it has "excellent" anything. It's OK for many cases, pretty good for some, and actually good for very small `N`. – Steve Jessop Jan 05 '14 at 17:06
  • Actually I stand by my argument that short-circuited selection sort is better because it is O(N*M), M being array length, instead of just O(M^2). – Marko Topolnik Jan 05 '14 at 17:08
  • Ok, fair point. I was thinking in cases of N being significantly smaller than the size of the collection for this algorithm and should have said that in the answer. – Tim B Jan 05 '14 at 17:09
  • I think your solution doesn't offer any advantage over short-circuited selection sort, in fact. It just uses more memory and the time complexity is the same. BTW if we're into truly optimal solutions, as Alan Stokes notes, sorted heap is the way to go. For example, SQL SELECT ... ORDER BY ... LIMIT N is implemented using a sorted heap. – Marko Topolnik Jan 05 '14 at 17:12
  • @MarkoTopolnik - Actually thinking about it this is almost an implementation of a selection sort, just storing the results in a separate array. Selection sort will need to do N passes over every element M in the array (well each pass reduces M by 1). This will in the best case do rather better but in the worst case do rather worse. I'm going to upvote your selection sort answer now :) – Tim B Jan 05 '14 at 17:15
  • I am returning the favor :) In fact, your approach is mostly a *priority queue* implementation, but with linear search applied to the auxiliary array, instead of the heap used in production software. – Marko Topolnik Jan 05 '14 at 17:23
1

Instead of using bubble sort, use quicksort kind of partial sorting.

  1. Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
  2. Count how many unique elements are there that are greater than equal to pivot.
  3. If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
  4. Iteratively repeat with smaller array and appropriate N

Complexity is O(n) and you will need constant extra memory.

ElKamina
  • 7,747
  • 28
  • 43
  • 1
    You have skimmed over "counting unique elements"---how do you do that without sorting and without extra memory? At least the time complexity is no longer O(n) then. – Marko Topolnik Jan 05 '14 at 16:24
0

HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.

It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.

Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.

If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699