4

What would be a good way to arrange the elements of a list(with repeating elements) according to the frequency of their occurrence in the list.

I need to use the top 5 frequently occurring items in the list.

I am thinking of using HashMap to count the elements' frequencies by incrementing the corresponding counter each time an element occurs & then doing HashMap iteration 5 times to find the highest freq. element over each iteration.

Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294

4 Answers4

5

How about this approach ?

maintain a map in that holds count

public static Map  <Foo,Integer>;

class Foo implements Comparator<Foo>{  
      private Bar element;


      public int compare(Foo f1, Foo f2){
       return SomeClass.map.get(f1) - SomeClass.map.get(f2);
      }

    }

just update map with update in list.

Wrap the access to List forcefully with the addFooToList() , removeFooFromList() and encapsulate the map updation logic there.

jmj
  • 237,923
  • 42
  • 401
  • 438
5

You can use a Guava Multiset and order it by frequency


And about performance. Of course it depends on how many distinct values you have, but this test code took about a second on my machine. And I'd say that's reasonable enough for 10 M items:

Multiset<Integer> set = HashMultiset.create();
int amount = 10000000;
Random random = new Random();
for (int i = 0; i < amount; i++) {
    set.add(Integer.valueOf(random.nextInt(255)));
}
TreeSet<Entry<Integer>> sortedEntries = Sets.newTreeSet(
        new Comparator<Entry<Integer>>() {
    public int compare(Entry<Integer> a, Entry<Integer> b) {
        return Ints.compare(a.getCount(), b.getCount());
    }
});
Iterables.addAll(sortedEntries, set.entrySet());
for (Entry<Integer> entry : Iterables.limit(sortedEntries, 5)) {
    System.out.println(entry.getElement());
}
Community
  • 1
  • 1
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
2

Any comparison-based sorting will incur O(N log N) or worse time complexity, so (asymptotically) these are not good advices.

Your approach has O(N) time complexity, and that's the best you can get. You can try to lower the constant (currently you're making roughly 6*N accesses to elements of the list).

I would do it in two iterations like this: first count the frequencies using a HashMap. Next, iterate over the entries in the map and keep an ordered 5-element array of the 5 most frequent values seen so far. With each new element, check whether the value is more common than the 5th most common so far, and update the "Top 5" if necessary.


UPDATE A simpler solution, of the same time complexity. First, calculate frequencies using HashMap. Next, put all the entries into a PriorityQueue and pop five values. The entries should be value-frequency pairs, comparable by frequency (as in @Jigar's solution). Such an ordering will not be "consistent with equals" (see Comparable for explanation), but that's OK.

Bolo
  • 11,542
  • 7
  • 41
  • 60
0

I would also use a HashMap. I found some code where I did just that:

HashMap<String, Integer> counts = new HashMap<String, Integer>();

void increment(String s) {
    Integer oldCount = counts.get(s);
    if (oldCount == null) {
        counts.put(s, 1);
    } else {
        counts.put(s, oldCount + 1);
    }
}

Listing the elements:

Map.Entry<String, Integer>[] array = new Map.Entry[counts.size()];
counts.entrySet().toArray(array);
Arrays.sort(array, new Comparator<Map.Entry<String, Integer>>() {
    public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
        return b.getValue() - a.getValue();
    }
});
int x = 0, min = 0;
for (Map.Entry<String, Integer> el : array) {
    String k = el.getKey();
    println("Count: " + el.getValue() + "\n" + k + "\n\n");
}
Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132