1

So I was asked a weird inversion of the K best candidates problem. The normal problem is as follows.

Given a list of 'votes' which are tuples of timestamps and candidates like below:

(111111, Clinton)
(111111, Bush)
...

Return the top K candidates with the most votes.

Its a typical problem and the solution is to use a hashmap of candidates->votes within the timestamp bound also build a min heap of size K where basically the top of the heap is the candidate that is vulnerable to being ejected from the K best candidates.

In the end you return the heap.

But I was asked in the end: Given a list of K candidates, return the timestamp that matches these as the K best candidates. I'm not sure if I'm recalling the question 100% correctly because it would have to either be the first occurrence of these K candidates as the best or I would have been given their vote tally.

Jesse
  • 3,522
  • 6
  • 25
  • 40
SpaceDandy
  • 11
  • 2

1 Answers1

0

If I understand everything, votes is a list of vote tuples that are made up of candidates that are being voted for and the timestamp of the vote taking place. currTime is the timestamp of all of the votes during that timestamp of before it. topCandidates are the candidates with the highest vote at currTime.

Your first question gives you votes and currTime, you are expected to return topCandidates. Your second question gives you votes and topCandidates, and you are expected to return currTime.

Focusing on the second question, I would make a map where the keys are a timestamp and the values are all of the votes taking place at that moment. Also I would create another map where the key is the candidate and the value is the number of votes they have so far. I would traverse the first map in the ascending timestamp order of the first map, get all of the votes that were cast at the timestamp and increment the second map's values by their candidate (key). Before going through the next timestamp, I would create a list of the most voted for candidates with the data in the second map. If that list matches topCandidates, then the last timestamp you traversed through is currTime.

To code this in python:

from collections import Counter, defaultdict
def findCurrTime(votes, topCandidates):
    if not (votes and topCandidates):
        return -1
    votesAtTime = defaultdict(list)
    candidatePoll = Counter()
    k = len(topCandidates)
    for time, candidate in votes: # votes = [(time0, candidate0), ...]
        votesAtTime[time].append(candidate)
    for ts in votesAtTime:
        candidatePoll += Counter(voteAtTime[ts])
        if list(map(lambda pair: pair[0],candidatePoll.most_common(k))) == topCandidates:
            return ts
    # if topCandidates cannot be created from these votes:
    return -1

There are some assumptions that I've made (that you hopefully asked your interviewer about). I assumed that the order of topCandidates mattered which Counter.most_common handled, although it won't handle candidates with number of votes.

The time complexity is O(t * n * log(k)) with t being the number of timestamps, n being the number of votes and k being the size of topCandidates. This is because Counter.most_common looks to be O(n*log(k)) and it can run t times. There are definitely more efficient answers though.

user2361174
  • 1,872
  • 4
  • 33
  • 51
  • Arent you assuming the timestamps returned as keys to your dict are sorted? Also most common is klog(k) here. So why is it not O ( t * n * (klogk))? Also wont there be a step to compare what is returned by most common to topCandidates which is k^2 or atleast with some extra space 2k? Thanks for this general approach so far though – SpaceDandy May 28 '18 at 00:20
  • The keys in dictionaries are traversed in the order that they're made, so as long as the timestamps are sorted in `votes` (which you should ask your interviewer about), then the timestamps should be traversed in ascending order. most_common is O(n*log(k)) because we're looking for the top k values from a list that could be n (at the final timestamp). Similar to how traversing through half of a square matrix is O(n^2). Comparing the most_common and `topCandidates` is simply comparing two lists together, which is O(n) which significantly below the most_common time. – user2361174 May 28 '18 at 02:39