Can you do Top-K frequent Element better than O(nlogn) ? (code attached)

Question

Recently encounter an algorithm question which asked to do Top-K frequent Element better than O(nlogn). I implemented it using C++ and did some complexity analysis. I feel it is NOT possible to do better than O(nlogn). Attached my code and complexity analysis in lines. I used map. I think unordered_map in practice will do better but ,for worst case analysis, unordered_map is worse than map due to hash conflict. It would make the first block to O(n^2) in worst case.

Description: Given a non-empty array of integers, return the k most frequent elements. For example, Given [1,1,1,2,2,3] and k = 2, return [1,2]. Note: You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

Many solutions from google search are using hash table and those solutions assumed inserting operation to a hash table is O(1) which, in my opinion, is not true. Because in worst case, hash table produce O(n^2) for insert operation when hash conflict happens. Big O notation is all about worst case analysis.

vector<int> topKFrequent(vector<int>& nums, int k) {
        map<int, int> _map;
        vector<int> result;

        //complexity: O(nlogn)
        for(auto n : nums) {
            // map.find() has O(logn)
            if (_map.find(n) != _map.end()) {
                // have seen before
                _map[n]++;
            } else {
                // new one, 
                _map[n]=1;
            }
        }

        //complexity: O(n)
        vector<pair<int,int>> v; 
        for(auto itr = _map.begin(); itr != _map.end(); ++itr) {
            v.push_back(make_pair(itr->first, itr->second));
        }

        // complexity: O(n)  (build a heap)
        auto f = [] (pair<int,int> p1, pair<int,int> p2)  { return p1.second < p2.second; };
        make_heap(v.begin(), v.end(), f);


        // complexity: O(k*logn)
        while(k) {
            result.push_back(v.front().first);
            pop_heap(v.begin(), v.end(),f); // delete heap from binary tree O(logn)
                                            //    1. swap root with last leaf O(1)
                                            //    2. find right place for last leaf
            v.pop_back();
            k--;
        }

        return result;

    }

Thanks in advance for your response.

This is off-topic for SO and belongs either on Code Review or on CSTheory. — Kerrek SB, May 14 '16 at 12:00
Please describe the problem instead of assuming that we read you mind. — , May 14 '16 at 13:12
Big O notation is about the asymptotic behavior of functions. Worst case analysis of the asymptotic behavior of an algorithm is merely one application. — , May 14 '16 at 14:14

Can you do Top-K frequent Element better than O(nlogn) ? (code attached)

0 Answers0