0

I've encountered this problem and still trying to solve this problem:

You want to log the number of hits to a site.

Implement two functions, log_hit() which gets called when a hit is registered, and get_hits_in_last_five_minutes() which returns the total number of hits in the last five minutes. Assume that all timestamps come in increasing order.

My idea (attempt to solve the problem):

Datastructure: Array.

My Logic:

When log_hit() is called, we basically store the time in ms (Date().getTime() in ms) in the array and store the hits on the particular second in hashmap.

function Counter() {
  this.map = {};
  this.store = [];
}

Counter.prototype.log_hit = function() {
  const d = new Date().getTime()/1000;
  this.store.push(d);
  this.map[d] = this.map[d] ? this.map[d]++ : 1;
}

Counter.prototype.get_hits_in_last_five_minutes = function() {
 const fiveMin = 60 * 60; //seconds.
 const initalPointer = new Date().getTime() / 1000 - fiveMin;
 // Somehow retrieve the value and return it back
}

However, I don't think this is the most optimal way of solving it , if I wanted to extend the solution for hour or some other granularity.

How would I solve this kind of problem?

Community
  • 1
  • 1
TechnoCorner
  • 4,879
  • 10
  • 43
  • 81
  • Store them in a queue not a map. Then when someone requests the list for the last five minutes loop from the beginning and truncate the list at the first value over 5. This would also prevent the list from getting too big. Every time you log some additional hit as well (say every `hit % 1000 == 0` hits) you would again truncate the list – Mitch Nov 05 '18 at 02:40
  • You are right. I got my solution working! thank you. – TechnoCorner Nov 05 '18 at 04:23
  • @MitchelPaulin Your approach is incorrect. You can't truncate lists because last 5 minutes would have some content that is useful for further calls to `log_hit()` as well. For example, consider hit series as follows(assume every hit in terms of `minutes`)- `[1,2,3,4,5,6,7,8,9,10]`. So, for `7` the hits are `[2,3,4,5,6]`. For 8, those are `[3,4,5,6,7]`. If you truncate, you will lose the valuable data. – nice_dev Nov 05 '18 at 08:58
  • @TechnoCorner can you confirm if every new log_hit() registered will have greater value than the previous one? – nice_dev Nov 05 '18 at 09:32
  • @vivek_23 I was using the assumption that all registered hits would arrive on time. Regardless my solution would still be a reasonable heuristic. However from what I understand they only needed the number of hits and storing every hit and then querying every single hit is not going to work if you don't truncate at some point – Mitch Nov 05 '18 at 13:50
  • @MitchelPaulin Of course, every hit would arrive on time. Also, we can't compromise on accuracy thinking about heuristic. We could use binary search to get the answer in `log(n)` time if OP confirms that every new hit time is larger than the previous one. if the hits are never ending, we could truncate them, but not from the end, it would be from the start assuming every new hit is bigger than the previous one. If this isn't true and hits are in random order, I am afraid we can't truncate anything. – nice_dev Nov 05 '18 at 14:00
  • @vivek_23 why cant we loose values of old hits if all we care about is how many total and how many in the last five minutes. If he wants to store information about every hit he will ever get he should use a DB or something. Regardless the OP was happy with the solution so it solved his problem – Mitch Nov 05 '18 at 15:00
  • @MitchelPaulin `Regardless the OP was happy with the solution so it solved his problem` I don't see how it solved it. I have given you the test case in my first comment to make you understand why it would fail. – nice_dev Nov 05 '18 at 15:08
  • @vivek_23 yes, I can confirm that every new log_hit() will have greater value than previous value. :-D Let me add this to OP. – TechnoCorner Nov 05 '18 at 18:18

2 Answers2

0

Using queue would be the right way to deal with this problem.

   var HitCounter = function() {
            this.queue = [];
   };

/**
 * Record a hit.
        @param timestamp - The current timestamp (in seconds granularity). 
 * @param {number} timestamp
 * @return {void}
 */
HitCounter.prototype.hit = function(timestamp) {
    this.queue.push(timestamp);
};

/**
 * Return the number of hits in the past 5 minutes.
        @param timestamp - The current timestamp (in seconds granularity). 
 * @param {number} timestamp
 * @return {number}
 */
HitCounter.prototype.getHits = function(timestamp) {
    while(this.queue.length && this.queue[0] <= timestamp-300) {
        this.queue.shift();
    }
    return this.queue.length;
};

const counter = new HitCounter();
counter.hit(1);
counter.hit(2);
counter.hit(3);
counter.getHits(4);
counter.hit(300);
counter.getHits(300);
console.log(counter.getHits(301)); // should output 3.
TechnoCorner
  • 4,879
  • 10
  • 43
  • 81
  • This is incorrect. You can't truncate lists because last 5 minutes would have some content that is useful for further calls to `log_hit()` as well. For example, consider hit series as follows(assume every hit in terms of `minutes`)- `[1,2,3,4,5,6,7,8,9,10]`. So, for `7` the hits are `[2,3,4,5,6]`. For `8`, those are `[3,4,5,6,7]`. If you truncate, you will lose the valuable data. – nice_dev Nov 05 '18 at 09:00
  • Even if you want to truncate the list, you will have to do it in a smart way(losing the subarray till maximum possible value which wouldn't affect further calls). – nice_dev Nov 05 '18 at 09:07
0

We need to exploit this fact-

Assume that all timestamps come in increasing order.

Algorithm:

  • We record every hit in an array and increase it's size gradually by 1.
  • To get the no. of hits in the last 5 minutes(excluding current hit), we do a binary search to get our answer since all timestamps come in increasing order, which would be sorted by default.
  • First, we do a binary search on the array to get the last valid upper bound index with respect to time t provided to get_hits_in_last_five_minutes().
  • Upper bound used is the limit till which we could use the hits to judge the no. of calls in the last 5 minutes. This is necessary because the problem statement says get_hits_in_last_five_minutes() which returns the total number of hits in the last five minutes. So, technically it means that it will be used as an API to check how many calls were made till 5 minutes prior to the parameter passed time t. It doesn't guarantee that the time t passed to this method will always be the last inserted timestamp in the counter. Due to this, we need to search for the upper bound in the array, i.e, till which index the hits stored in the array could be counted as valid for our answer.

  • Second, we do a binary search from 0 till upper_bound to get all the valid hits that were under last 5 minutes prior to t.

  • Space complexity: O(n) where n is the no. of hits registered.

  • Time Complexity:
    • O(log(n)) to search for the upper bound.
    • O(log(n)) to get the actual hits registered in last 5 minutes.
    • Total Complexity = O(log(n)) + O(log(n)) = 2 * O(log(n)) = O(log(n))

Note: I converted time t in to seconds while storing and searching.

Code:

function Counter() {
  this.hits = [];
  this.hits_size = 0;
}

Counter.prototype.log_hit = function(t) {
  this.hits[this.hits_size++] = t * 60;
}

Counter.prototype.get_hits_in_last_five_minutes = function(t) {
  if (this.hits_size < 2) return 0;
  t *= 60;
  var upper_bound = this.getUpperBound(t);
  this.last_call_type = 2;
  var low = 0,
    high = upper_bound;
  while (low <= high) {
    var mid = low + parseInt((high - low) / 2);
    if (this.hits[mid] > t - 300) high = mid - 1;
    else if (this.hits[mid] < t - 300) low = mid + 1;
    else return upper_bound - mid + 1;
  }

  return upper_bound - low + 1;
}

Counter.prototype.getUpperBound = function(t) {
  var low = 0,
    high = t > this.hits[this.hits_size - 1] ? this.hits_size - 1 : this.hits_size - 2;
  var ans = 0;
  while (low <= high) {
    var mid = low + parseInt((high - low) / 2);
    if (this.hits[mid] >= t) {
      high = mid - 1;
    } else {
      ans = mid;
      low = mid + 1;
    }
  }

  if (high < 0) return -1;
  return ans;
}


console.log("*****Counter 1******");
var c1 = new Counter();
c1.log_hit(1);
console.log("Registered, 1 = " + c1.get_hits_in_last_five_minutes(1));
c1.log_hit(2);
console.log("Registered, 2 = " + c1.get_hits_in_last_five_minutes(2));
c1.log_hit(3);
console.log("Registered, 3 = " + c1.get_hits_in_last_five_minutes(3));
c1.log_hit(4);
console.log("Registered, 4 = " + c1.get_hits_in_last_five_minutes(4));
c1.log_hit(5);
console.log("Registered, 5 = " + c1.get_hits_in_last_five_minutes(5));
c1.log_hit(6);
console.log("Registered, 6 = " + c1.get_hits_in_last_five_minutes(6));
c1.log_hit(7);
console.log("Registered, 7 = " + c1.get_hits_in_last_five_minutes(7));
c1.log_hit(8);
console.log("Registered, 8 = " + c1.get_hits_in_last_five_minutes(8));
c1.log_hit(9);
console.log("Registered, 9 = " + c1.get_hits_in_last_five_minutes(9));
c1.log_hit(10);
console.log("Registered, 10 = " + c1.get_hits_in_last_five_minutes(10));

console.log("*****Counter 2******");
var c2 = new Counter();
c2.log_hit(2);
console.log("Registered, 2 = " + c2.get_hits_in_last_five_minutes(2));
c2.log_hit(7);
console.log("Registered, 7 = " + c2.get_hits_in_last_five_minutes(7));
c2.log_hit(8);
console.log("Registered, 8 = " + c2.get_hits_in_last_five_minutes(8));
c2.log_hit(9);
console.log("Registered, 9 = " + c2.get_hits_in_last_five_minutes(9));
c2.log_hit(10);
console.log("Registered, 10 = " + c2.get_hits_in_last_five_minutes(10));
c2.log_hit(11);
console.log("Registered, 11 = " + c2.get_hits_in_last_five_minutes(11));
c2.log_hit(12);
console.log("Registered, 12 = " + c2.get_hits_in_last_five_minutes(12));
c2.log_hit(17);
console.log("Registered, 17 = " + c2.get_hits_in_last_five_minutes(17));
console.log("Unregistered, 18 = " + c2.get_hits_in_last_five_minutes(18));
c2.log_hit(19);
console.log("Registered, 19 = " + c2.get_hits_in_last_five_minutes(19));
console.log("Unregistered, 20 = " + c2.get_hits_in_last_five_minutes(20));
c2.log_hit(21);
console.log("Registered, 21 = " + c2.get_hits_in_last_five_minutes(21));
console.log("Unregistered, 6 = " + c2.get_hits_in_last_five_minutes(6));
console.log("Unregistered, 500 = " + c2.get_hits_in_last_five_minutes(500));
console.log("Unregistered, 15 = " + c2.get_hits_in_last_five_minutes(15));
console.log("Registered, 17 = " + c2.get_hits_in_last_five_minutes(17));

Why binary search?

  • You might wonder as to why not loop backwards and get all those values' count that are under t - 300. This way, it could be O(1) per call.
  • Note that our search space is just under t - 300. If this increases, for example, t - 9000 then our backward iteration also increases till 9000, and if the number of calls for get_hits_in_last_five_minutes() happens to be 10000 or more, then the complexity of just looping multiplies to the overall complexity. So, it could be

10000 calls to get_hits_in_last_five_minutes() * 9000

If we use the algorithm described above, it will be

10000 calls to get_hits_in_last_five_minutes() * log(n)

What if the calls are never ending(Infinite)?

  • This depends upon how we choose to use the get_hits_in_last_five_minutes() method.

    • If time t passed to the calls made to get_hits_in_last_five_minutes() will always be in non-decreasing manner, then we could truncate/remove hits from our storage.

      • To do that, we can again do a binary search to get the index of the maximum value from our storage that doesn't come under t - 300. After that, we could just do an array slice and reset this.hits_size to the new length.
        • We need to do this in log_hit() method and not get_hits_in_last_five_minutes() since time t passed to it need not necessarily be a part of our registered hits.
        • Using array slice could add up to the complexity a bit since it returns a shallow copy of the original array and it's complexity is O(N) where N is end - start. See array slice complexity. To avoid this, we can make a linked list and store data in it and use a map to store the nodes. This way, we could reset the list's head and make the trimming/truncating O(1). We will need to maintain a count of truncated nodes too as we can't reset the map with new values.
    • If the time t passed to the calls made to get_hits_in_last_five_minutes() for your website are in any random order, then we can't truncate anything since we need all data to give an answer. In that case, probably store data in the database. The good part is, you could just query your answer from DB and instead of doing computation in javascript.

nice_dev
  • 17,053
  • 2
  • 21
  • 35