5

Given an array A with possible duplicate entries, find the k entries that occur most frequently.

My approach:

Create a MinHeap of k most occurring elements ordered by the frequency. top element obviously being least occurring of rest of the elements. Create a HashMap to keep track of all element counts and whether or not they are in MinHeap.

When reading a new integer:

  • check if it is in HashMap: Increment the count in HashMap
  • also if it is check if it is in Heap :then Increment the count there also and heapify.
  • if not then compare with root element count and remove the root to add this if necessary. Then heapify.

In the end return MinHeap as desired output.

class Wrapper{
 boolean inHeap;
 int count;
}

This would take O(n+k) space and O(n log k) time complexity. Is there a better way to do space and/or time complexity wise.

m0nish
  • 95
  • 1
  • 5

8 Answers8

7

We can say the space complexity of your approach is O(n), since you can never use more than O(2n) = O(n) memory.


Skip the heap and just create the HashMap.

After you've created the HashMap, you can iterate through it and put all the elements in an array.

Then you can run a selection algorithm such as quickselect on the array to get k-th element, and the first k elements from there (the extension to extract the first k elements via quickselect is fairly trivial, or you can just iterating through again to get them).

Then you sort the k elements, if required.

The running time would be expected O(n) or O(n + k log k) if sorting is required.

The space complexity would be O(n).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
3

To add to the answer by @Dukeling. I have added the code in C++ below to explain the quickselect method.

Steps:

  1. Get frequency of each unique element by using a map.
  2. Perform quickselect to get the largest kth element.
  3. Select the elements in the vector from 0 to k.

Code:

#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#include <map>

using namespace std;

map<int,int> m;

void swap(int *a,int *b){
    int temp=*a;
    *a=*b;
    *b=temp;
}

void printelements(vector<int> &temp,int k){
    for(int i=0;i<=k;++i){
        cout<<temp[i]<<endl;
    }
} 

int partition(vector<int> &a, int low,int high){
    int pivot = high-1;
    int i=low-1;
    for(int j=low;j<high-1;j++){
        if(m[a[j]]>=m[a[pivot]]){
            i++;
            swap(&a[i],&a[j]);
        }
    }
    i++;
    swap(&a[i],&a[pivot]);
    return i;
}

void quickselect(vector<int> &temp,int low,int high,int k){
    if(low>high){
        return ;
    }
    int pivot=partition(temp,low,high);
    if(k==pivot){
        printelements(temp,k);
        return;
    }
    else if(k<pivot){
        quickselect(temp,low,pivot,k);
    }
    else{
        quickselect(temp,pivot+1,high,k);
    }
}

void topKelements(int a[],int n,int k){
    if(k<0)return ;
    for(int i=0;i<n;i++){
        if(m.find(a[i])!=m.end()){
            m[a[i]]++;
        }
        else{
            m.insert(pair<int,int>(a[i],1));
        }
    }
    vector<int> temp;
    map<int,int>::iterator it;
    for(it=m.begin();it!=m.end();++it){
        temp.push_back(it->first);
    }
    k=min(k,(int)temp.size()-1);
    quickselect(temp,0,temp.size(),k);
}

int main() {
    int a[] = {1,2,3,4,1,1,2,3,4,4,4,1};
    int k = 2;
    topKelements(a,12,k-1);
}

Output: 1 4 2

Viraj
  • 777
  • 1
  • 13
  • 32
1

There are many different algorithms for the task of determining so-called frequent items, counter-based and sketch-based. Among the counter-based algorithms, the current best one is Space Saving (other algorithms are Lossy Count and Frequent).

Space saving requires in the worst case O(n) time and k+1 counters to find k frequent items in an input consisting of $n$ entries.

Massimo Cafaro
  • 25,429
  • 15
  • 79
  • 93
0

Consider having a Map which maps the number to it's count of occurrence. Maintain a separate int which contains the currently largest count of any number and a List containing each of the numbers with /count/ numbers. This approach will allow you to know the results after a single iteration over all the values. At worst (if all values have 1 occurrence) you use 2x memory (1 entry in both map and list). Even this could be worked around by only starting to add items to the list once a single entry has 2 occurrences.

Brett Okken
  • 6,210
  • 1
  • 19
  • 25
  • 2
    I think you're misunderstanding the problem, or I'm misunderstanding your algorithm. If x appears 100 times, y appears 95 times and z appears 50 times and we want find the 2 most occurring items, we should return x and y. With your algorithm you'll only end up with x (because that's all the list will contain). – Bernhard Barker Jun 04 '14 at 02:49
  • I definitely did misunderstand - I thought only looking for top occurring element. The same principal could occur, however, using a MultiMap rather than an int and a List. This would allow tracking of n largest occurrences. – Brett Okken Jun 04 '14 at 11:14
  • As k increases in size (towards the number of unique values), however, this becomes less useful of an approach – Brett Okken Jun 04 '14 at 11:38
0

I agree that the heap would be complicating it. You could simply MergeSort the array (O(k log k) time), and then run through the array once creating your HashMap (O(n) time)). Total runtime O(n + k*log(k)) = O(k*log(k)).

MikeM
  • 342
  • 2
  • 10
  • 3
    Merge sort on the original array would be O(n log n) (and thus the solution in the question outperforms this). And what do you do after creating the hash-map (the hash-map doesn't give you the k most occurring elements as is)? – Bernhard Barker Jun 04 '14 at 02:34
  • Well, the HashMap would be intended to contain the number of occurrences for values, and then sort the HashMap by value (i.e. number of occurrences). That should give the solution(s), although it seems you're right... this does not outperform the solution in the question. – MikeM Jun 04 '14 at 12:17
0

Well, in your step 2, how could you find an element in heap in log(k) time? Notice heap is not sorted, and at a parent node, there is NO WAY to decide which child to go. You have to iterate all heap members, so total time is O(nk) time.

If change heap to binary search tree (like TreeMap), you can find a frequency in log(k) time. But you have to deal with duplicate keys, since different element can have same counts.

Jun
  • 171
  • 4
  • 16
0
public class ArrayProblems {
    static class Pair {
        int value;
        int count;

        Pair(int value, int count) {
           this.value = value;
           this.count = count;
       }
    }
/*
 * Find k numbers with most occurrences in the given array
 */
public static void mostOccurrences(int[] array, int k) {
    Map<Integer, Pair> occurrences = new HashMap<>();
    for(int element : array) {
        int count = 1;
        Pair pair = new Pair(element, count);
        if(occurrences.containsKey(element)) {
            pair = occurrences.get(element);
            pair.count++;
        }
        else {
            occurrences.put(element, pair);
        }
    }

    List<Pair> pairs = new ArrayList<>(occurrences.values());
    pairs.sort(new Comparator<Pair>() {
        @Override
        public int compare(Pair pair1, Pair pair2) {
            int result = Integer.compare(pair2.count, pair1.count);
            if(result == 0) {
                return Integer.compare(pair2.value, pair1.value);
            }
            return result;
        }
    });

    int[] result = new int[k];
    for(int i = 0; i < k; i++) {
        Pair pair = pairs.get(i);
        result[i] = pair.value;
    }

    System.out.println(k + " integers with most occurence: " + Arrays.toString(result));

}

public static void main(String [] arg)
{
    int[] array  = {3, 1, 4, 4, 5, 2, 6, 1};
    int k = 6;
    ArrayProblems.mostOccurrences(array, k);

    // 3 --> 1
    // 1 --> 2
    // 4 --> 2
    // 5 --> 1
    // 2 --> 1
    // 6 --> 1
}

}

0

You can use hashmap. Increment value in map if its already present. Then Then using lambda sort resulted map with limit to k values.

    import java.util.Arrays;
    import java.util.HashMap;
    import java.util.LinkedHashMap;
    import java.util.Map;

    public class MaxRepeating
    {
        static void maxRepeating(int arr[], int n, int k)
        {
            Map<Integer,Integer> map=new HashMap<Integer,Integer>();
            // increment value in map if already present
            for (int i = 0; i< n; i++){
                map.put(arr[i], map.getOrDefault(arr[i], 0)+1);

            }
            map.entrySet().stream()
                    .sorted(Map.Entry.<Integer, Integer>comparingByValue().reversed())
                       .limit(k).forEach(System.out::println);

        }

        /*Driver function to check for above function*/
        public static void main (String[] args)
        {

            int arr[] = {7, 10, 11, 5, 2, 5, 5, 7, 11, 8, 9};
            int n = arr.length;
            int k=4;
            maxRepeating(arr,n,k);
        }
    }
Vikas
  • 6,868
  • 4
  • 27
  • 41