3

Suppose I read a stream of integers. The same integer may appear more than once in the stream. Now I would like to keep a cache of N integers that appeared most frequently. The cache is sorted by the frequency of the stream elements.

How would you implement it in Java?

Michael
  • 41,026
  • 70
  • 193
  • 341
  • The interesting part to this is how to handle those numbers that are not in the currently in the top N cache. Is it a requirement to only store a fixed number of distinct integers from the stream in your cache? – Joel Jul 15 '11 at 15:21

5 Answers5

3

You want to use a binary indexed tree, the code in the link is for C++ and should be fairly straightforward to convert into Java (AFAICT the code would be the same):

Paper Peter Fenwick

Implementation in C++

Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
2

Use a Guava Multiset and sort it by frequency

Community
  • 1
  • 1
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
2
public class MyData implements Comparable<MyData>{
  public int frequency = 0;
  public Integer data;
  @Override
  public int compareTo(MyData that) {
    return  this.frequency - that.frequency;
  }

}

Have it stored in a PriorityQueue

Nishant
  • 54,584
  • 13
  • 112
  • 127
1

Create an object model for the int, inside create a Count property. Create a SortedVector collection extending the Vector collection. Each time an integer occurs, add it to the vector if it doesn't exist. Else, find it, update the count property += 1, then call Collections.sort(this) within your Vector.

Josh
  • 2,955
  • 1
  • 19
  • 28
1

Do you know the range of the numbers? If so, it might make sense to use an array. For example, if I knew that the range of the numbers was between 0 and 10, I would make an array of size 10. Each element in this array would count the number of times I've seen a given number. Then, you just have to remember the most frequently seen number.

e.g.

array[10];
freq_index = -1;
freq_count = -1;

readVal(int n){
  array[n]+=1;
  if array[n] > freq_count
    freq_index = n;
    freq_count = array[n];
}

Of course, this approach is bad if the distribution of numbers is sparse.

I'd try a priority queue.

David Weiser
  • 5,190
  • 4
  • 28
  • 35