Recently encounter an algorithm question which asked to do Top-K frequent Element better than O(nlogn). I implemented it using C++ and did some complexity analysis. I feel it is NOT possible to do better than O(nlogn). Attached my code and complexity analysis in lines. I used map. I think unordered_map in practice will do better but ,for worst case analysis, unordered_map is worse than map due to hash conflict. It would make the first block to O(n^2) in worst case.
Description: Given a non-empty array of integers, return the k most frequent elements. For example, Given [1,1,1,2,2,3] and k = 2, return [1,2]. Note: You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.
Many solutions from google search are using hash table and those solutions assumed inserting operation to a hash table is O(1) which, in my opinion, is not true. Because in worst case, hash table produce O(n^2) for insert operation when hash conflict happens. Big O notation is all about worst case analysis.
vector<int> topKFrequent(vector<int>& nums, int k) {
map<int, int> _map;
vector<int> result;
//complexity: O(nlogn)
for(auto n : nums) {
// map.find() has O(logn)
if (_map.find(n) != _map.end()) {
// have seen before
_map[n]++;
} else {
// new one,
_map[n]=1;
}
}
//complexity: O(n)
vector<pair<int,int>> v;
for(auto itr = _map.begin(); itr != _map.end(); ++itr) {
v.push_back(make_pair(itr->first, itr->second));
}
// complexity: O(n) (build a heap)
auto f = [] (pair<int,int> p1, pair<int,int> p2) { return p1.second < p2.second; };
make_heap(v.begin(), v.end(), f);
// complexity: O(k*logn)
while(k) {
result.push_back(v.front().first);
pop_heap(v.begin(), v.end(),f); // delete heap from binary tree O(logn)
// 1. swap root with last leaf O(1)
// 2. find right place for last leaf
v.pop_back();
k--;
}
return result;
}
Thanks in advance for your response.