13

I need to keep top N(< 1000) integers while trying to add values from a big list of integers(around a million sized lazy list). I want to be try adding values to a collection but that needs to keep only the top N(highest values) integers. Is there any preferred data structure to use for this purpose ?

Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294

7 Answers7

11

I'd suggest to use some sorted data structure, such as TreeSet. Before insertion, check the number of items in the set, and if it reached 1000, remove the smallest number if it's smaller than the newly added number, and add the new number.

TreeSet<Integer> set = ...;

public void add (int n) {
    if (set.size () < 1000) {
       set.add (n);
    } else {
       Integer first = set.first();
       if (first.intValue() < n) {
          set.pollFirst();
          set.add (n);
       }
    }
}
Eran
  • 387,369
  • 54
  • 702
  • 768
8

Google Guava MinMaxPriorityQueue class.

You can also use custom sorting by using a comparator (Use orderedBy(Comparator<B> comparator) method).

Note: This collection is NOT a sorted collection.

See javadoc

Example:

@Test
public void test() {
    final int maxSize = 5;

    // Natural order
    final MinMaxPriorityQueue<Integer> queue = MinMaxPriorityQueue
            .maximumSize(maxSize).create();
    queue.addAll(Arrays.asList(10, 30, 60, 70, 20, 80, 90, 50, 100, 40));

    assertEquals(maxSize, queue.size());
    assertEquals(new Integer(50), Collections.max(queue));

    System.out.println(queue);
}

Output:

[10, 50, 40, 30, 20]

Pritesh Mhatre
  • 3,847
  • 2
  • 23
  • 27
2

One efficient solution is a slightly tweaked array-based priority queue using a binary min-heap.

First N integers are simply added to the heap one by one or you can build it from array of first N integers (slightly faster).

After that, compare the incoming integer with the root element (which is MIN value found so far). If the new integer is larger that that, simply replace the root with this new integer and perform down-heap operation (i.e. trickle down the new integer until both its children are smaller or it becomes a leaf). The data structure guarantees you will always have N largest integers so far with average addition time of O(log N).

Here is my C# implementation, the mentioned method is named "EnqueueDown". The "EnqueueUp" is a standard enqueue operation that expands the array, adds new leaf and trickles it up.

I have tested it on 1M numbers with max heap size of 1000 and it runs under 200 ms:

namespace ImagingShop.Research.FastPriorityQueue
{
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Linq;
    using System.Runtime.CompilerServices;

    public sealed class FastPriorityQueue<T> : IEnumerable<Tuple<T, float>>
    {
        private readonly int capacity;
        private readonly Tuple<T, float>[] nodes;

        private int count = 0;

        public FastPriorityQueue(int capacity)
        {
            this.capacity = capacity;
            this.nodes = new Tuple<T, float>[capacity];
        }

        public int Capacity => this.capacity;

        public int Count => this.count;

        public T FirstNode => this.nodes[0].Item1;

        public float FirstPriority => this.nodes[0].Item2;

        public void Clear()
        {
            this.count = 0;
        }

        public bool Contains(T node) => this.nodes.Any(tuple => Equals(tuple.Item1, node));

        public T Dequeue()
        {
            T nodeHead = this.nodes[0].Item1;

            int index = (this.count - 1);

            this.nodes[0] = this.nodes[index];
            this.count--;

            DownHeap(index);

            return nodeHead;
        }

        public void EnqueueDown(T node, float priority)
        {
            if (this.count == this.capacity)
            {
                if (priority < this.nodes[0].Item2)
                {
                    return;
                }

                this.nodes[0] = Tuple.Create(node, priority);

                DownHeap(0);

                return;
            }

            int index = this.count;

            this.count++;

            this.nodes[index] = Tuple.Create(node, priority);

            UpHeap(index);
        }

        public void EnqueueUp(T node, float priority)
        {
            int index = this.count;

            this.count++;

            this.nodes[index] = Tuple.Create(node, priority);

            UpHeap(index);
        }

        public IEnumerator<Tuple<T, float>> GetEnumerator()
        {
            for (int i = 0; i < this.count; i++) yield return this.nodes[i];
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void DownHeap(int index)
        {
            while (true)
            {
                int indexLeft = (index << 1);
                int indexRight = (indexLeft | 1);

                int indexMin = ((indexLeft < this.count) && (this.nodes[indexLeft].Item2 < this.nodes[index].Item2))
                    ? indexLeft
                    : index;

                if ((indexRight < this.count) && (this.nodes[indexRight].Item2 < this.nodes[indexMin].Item2))
                {
                    indexMin = indexRight;
                }

                if (indexMin == index)
                {
                    break;
                }

                Flip(index, indexMin);

                index = indexMin;
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void Flip(int indexA, int indexB)
        {
            var temp = this.nodes[indexA];
            this.nodes[indexA] = this.nodes[indexB];
            this.nodes[indexB] = temp;
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void UpHeap(int index)
        {
            while (true)
            {
                if (index == 0)
                {
                    break;
                }

                int indexParent = (index >> 1);

                if (this.nodes[indexParent].Item2 <= this.nodes[index].Item2)
                {
                    break;
                }

                Flip(index, indexParent);

                index = indexParent;
            }
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

The basic implementation is taken from "Cormen, Thomas H. Introduction to algorithms. MIT press, 2009."

Libor
  • 3,285
  • 1
  • 33
  • 41
1

In Java 1.7 one may use java.util.PriorityQueue. To keep the top N items you need to use reverse comparator, e.g. for integers you order them descending. In this manner the smallest number is always on top and could be removed if to many items in queue.

package eu.pawelsz.example.topn;

import java.util.Comparator;
import java.util.PriorityQueue;

public class TopN {

  public static <E> void add(int keep, PriorityQueue<E> priorityQueue, E element) {
    if (keep == priorityQueue.size()) {
      priorityQueue.poll();
    }
    priorityQueue.add(element);
  }

  public static void main(String[] args) {
    int N = 4;
    PriorityQueue<Integer> topN = new PriorityQueue<>(N, new Comparator<Integer>() {
      @Override
      public int compare(Integer o1, Integer o2) {
        return o1 - o2;
      }
    });

    add(N, topN, 1);
    add(N, topN, 2);
    add(N, topN, 3);
    add(N, topN, 4);
    System.out.println("smallest: " + topN.peek());
    add(N, topN, 8);
    System.out.println("smallest: " + topN.peek());
    add(N, topN, 5);
    System.out.println("smallest: " + topN.peek());
    add(N, topN, 2);
    System.out.println("smallest: " + topN.peek());
  }
}
Paweł Szczur
  • 5,484
  • 3
  • 29
  • 32
1
// this Keep Top Most K Instance in Queue 

public static <E> void add(int keep, PriorityQueue<E> priorityQueue, E element) {
    if(priorityQueue.size()<keep){
       priorityQueue.add(element);
    }
    else if(keep == priorityQueue.size()) {
      priorityQueue.add(element);    // size = keep +1 but
      Object o = (Object)topN.toArray()[k-1];
      topN.remove(o);                // resized to keep         
    }
}
somebody
  • 11
  • 2
0

The fastest way is likely a simple array items = new Item[N]; and a revolving cursor int cursor = 0;. The cursor points to the insertion point of the next element.

To add a new element use the method

put(Item newItem) { items[cursor++] = newItem; if(cursor == N) cursor = 0; }

when accessing this structure you can make the last item added appear at index 0 via a small recalculation of the index, i.e.

get(int index) { return items[ cursor > index ? cursor-index-1 : cursor-index-1+N ]; }

(the -1 is because cursor always point at the next insertion point, i.e. cursor-1 is the last element added).

Summary: put(item) will add a new item. get(0) will get the last item added, get(1) will get the second last item, etc.

In case you need to take care of the case where n < N elements have been added you just need to check for null.

(TreeSets will likely be slower)

Christian Fries
  • 16,175
  • 10
  • 56
  • 67
-2

Your Question is answered here: Size-limited queue that holds last N elements in Java

To summerize it: No there is no data structure in the default java sdk, but Apache commons collections 4 has a CircularFifoQueue.

Community
  • 1
  • 1
Waog
  • 7,127
  • 5
  • 25
  • 37