Suppose that millions of clients are sending heartbeats to one server within a fixed interval, so server judges that the client stopped sending heartbeats for a time larger than the interval is failed. If server just maintains a map and keeps iterating every client to check if the client is time out, that will produce a O(n) complexity, which is horrible to millions of clients. Are there some algorithms to make a O(log(n)) complexity just like some heaps or binary trees to solve problems like this? Thanks.
-
1By coding, what have you tried? – Dhanuka May 26 '15 at 09:11
-
Would storing the entries in a DB and have your application regularly call a stored procedure to purge the data work? – npinti May 26 '15 at 09:14
-
@npinti-That would still take traversal of O(n) on records which will be called by your DB procedure. – Am_I_Helpful May 26 '15 at 09:14
-
1@shekharsuman: Yes but at least you delegate it to something which is designed to work with vasts amount of data and will not store it all in memory. Using a hash map or a list, does not make any difference if you will need to periodically go through all the elements to check which ones have expired. – npinti May 26 '15 at 09:18
-
1What is the time duration between two heartbeats? Can clients choose to have different durations, and even change the duration during time? – Dialecticus May 26 '15 at 09:43
-
@Dialecticus- Logic says that heartbeats might be updating every second,and might be synchronous! But,that synchronous part is not an issue here. – Am_I_Helpful May 26 '15 at 09:46
-
@Dialecticus maybe a fixed 30s, less than 60s in short. – rainer rainer May 26 '15 at 10:29
-
What communication mechanism do you plan to use? UDP messages, TCP, web service? If million users should call in every 30 seconds maybe UDP should be used. Not sure if usual regular servers can handle million simultaneous active connections. – Dialecticus May 26 '15 at 11:02
3 Answers
If you keep a double linked list and a hashmap, you could use the hashmap to find the list entry that corresponds to the client that sends a heartbeat. After receiving the heartbeat, you unchain the list entry and put it to the end of the linked list.
The entry at the beginning of the linked list is a candidate "died" client. The linked list can be traversed up to the first client alive.
Some considerations:
Add a client: create a list entry, add it to the hashmap and to the end of the linked list. O(1)
Remove a client: find the client list entry by a lookup in the hashmap. Unchain the client. remove from the hashmap. O(1).
Process a heartbeat: find the client list entry by a lookup in the hashmap. unchain the client list entry. Add the list entry to the end of the linked list. O(1)
Find a dead client: Look at the first entry of the linked list. O(1)

- 2,842
- 16
- 16
-
First of all: a linked list is not a complex data structure. I use the linked list to keep a list of clients ordered by heartbeat time (oldest first). The hash is used to find the client entry in the linked list in O(1) time. The determination of the dead candidates is done in O(1) time as well. And maintaining the list doesn't depend on the size of the list. Hence O(1). – Ronald May 26 '15 at 09:22
-
I use the linked list to reduce the complexity of finding a dead client from O(n) to O(1). This is the classical trade of between mem and speed. – Ronald May 26 '15 at 09:30
-
@Ronal- Everything is same as my answer, except that your increased usage of Doubly-linked list! Why can't OP store client's heartbeat in a HashMap. You are claiming that you won't check for all the updates. The client's entry are changing wrt time. Heartbeat entries are never fixed,they are intended to change every second/some other time unit. Why aren't you understanding this! – Am_I_Helpful May 26 '15 at 09:41
-
@shekar: As a small aside and to explain the doubly linked list: this makes adding and removing items easy = O(1). But in your solution I don't understand how to find the client that didn't heartbeat long enough. Of course I can scan the hashmap, but that's an effort of O(n) which is too much for the OP. So that can't be the solution you propose. So please elaborate how to find the "died" candidate with the hashmap solution. – Ronald May 26 '15 at 09:49
-
Well,clients are iterated in complexity O(n) and their heartbeats are checked every second. If one doesn't send data,obviously the old entry will persist, thereby clearly hinting that new entries not received, before storing it in suitable index. – Am_I_Helpful May 26 '15 at 09:53
-
-
1@Ronald, oh, sorry, I did not read your answer properly, it is indeed better than my first approach. – Petr May 26 '15 at 11:22
-
I can see how time would be constant using this approach. Normally, in order to remove a client from the list would be a O(n) operation, but if you use the hash map to give a sort of "direct access" to the list node (and being the list a double linked one) you transform the operation in a O(1) one. Great job. – Gentian Kasa May 26 '15 at 15:11
-
1This one can be useful. It is easy to implement it using a single LinkedMap (actually also a combine of hash map and linked list in JAVA). Thank you! – rainer rainer May 27 '15 at 01:51
Approach 1
The simplest approach that will give you O(log N)
performance is to use priority queue (a heap). In this queue, for each client you keep a pair <last-heartbeet-time, client-id>
, with last-heartbeet-time
being the UTC time when you last received a heartbeet from that client, and making sure that the elements with smaller (earlier) last-heartbeet-time
are on the queue top.
Each time a client sends a heartbeet, you change the value in the corresponding pair, thus moving the client into a later position in the queue. And when you want to remove non-active clients, you just extract data from the queue until you reach clients that have last-heartbeet-time
more than your threshold.
Update is O(log N)
per heartbeet (see here or here under "Index priority queue" for more details), removal is O(log N)
per removed client. Note that this approach does not require to iterate over all clients to look for those that should be removed. It iterates only over those that are actually removed. So if you have N
clients, M
heartbeet requests in total, and K
removal in total, then this approach will process them in O(M log N + K log N)
time.
UPD: this solution can also handle the situation if your clients have different time-to-live, that is if the heartbeets from different clients have different expiration time. You just need to store not the last-heartbeet-time
, but the expiration-time-of-last-heartbeet
.
UPD2: please see Ronald's answer for a similar idea, but not using a heap and achieving O(1)
on each iteration — for the case all your clients have the same time-to-live it is better than above (you do not need a heap because every time you update some value, it moves to the end of queue).
Approach 2
Another approach can be as following. Keep an ordinary queue (FIFO) for all the heartbeets that you have received, and a separate array where for each client you store how much heartbeets of that client are already in the queue.
Every time a new heartbeet arrives, you push it into the queue and increase the number of heartbeets for that client.
Every time you want to remove inactive clients, you start popping data from the queue. You will be popping heartbeets in the order they have arrived, starting from the oldmost. Every time you pop a heartbeet, you decrease the corresponding value in the array and check whether this value has become zero. If if has become zero, then this client is inactive. You continue pops until you come to young enough heartbeets.
This might seem slow, but note that every heartbeet is popped only once, so the total number of pops will not be bigger that the total number of heartbeets receibed, hence the total complexity of this solution is just O(M)
! You cannot have a faster solution just because you need to process all the incoming heartbeets, which is already O(M)
, so if your application can handle the incoming flux of heartbeets, it should also be able to handle this solution.
A pseudocode for this solution is below
def receive_heartbeet(hb)
q.push(hb)
nActive[hb.client]++
def find_inactive
while q.front().time < currentTime - threshold
hb = q.pop()
nActive[hb.client]--
if nActive[hb.client] == 0
mark hb.client as inactive
-
This can be done easily using a HashMap in O(1), what HashMap's look-up time is! But, to iterate over all the client would be better than your case. Please check my answer. – Am_I_Helpful May 26 '15 at 09:26
-
@shekharsuman , my approach does not require iterating over _all_ clients, it iterates only over those that should be removed. – Petr May 26 '15 at 09:27
-
@Petr- I agree over this,but,overall, how will you determine that a corresponding heartbeat entry should be considered dead. Your this statement *Each time a client sends a heartbeet, you change the value in the corresponding pair, thus moving the client into a later position in the queue.* is too broad and it also uses O(n* log n)--- changing all those n entries in O(n) by checking all of them and also updation being O(log n). – Am_I_Helpful May 26 '15 at 09:34
-
@shekharsuman, see, for example, http://stackoverflow.com/questions/17075497/how-to-update-element-priorities-in-a-heap-for-prims-algorithm, or http://stackoverflow.com/questions/714796/priorityqueue-heap-update – Petr May 26 '15 at 09:41
-
@shekharsuman, you do not need to change _all_ `n` entries, you need to change one only on each update. This _can_ done in `O(log n)` time in a heap. – Petr May 26 '15 at 09:42
-
@Petr- That statement is the culprit! `you do not need to change all n entries, you need to change one only on each update` You need to be aware that heartbeat entries are changing all the time, every second. For updating all those n entries, it would finally take O(n*log n). In my case, updating a hashmap entry is O(1),based upon unique key for each client. ANd, then updating all those entries is O(n*1) = O(n). I hope this clears your doubt... – Am_I_Helpful May 26 '15 at 09:45
-
@shekharsuman, then this is up to OP to decide which is better. Can he handle `O(log N)` on each update, or can he handle `O(N)` each time he needs to check for inactive. – Petr May 26 '15 at 09:55
-
@Petr- Don't you agree, that updates are occuring every time_unit/second? Please answer this and our discussion will be over. Till today I had been knowing that heartbeats are updating every second,but,after this answer, I will have to believe that heartbeats can run/stop as and whenever they want. – Am_I_Helpful May 26 '15 at 10:03
-
@shekharsuman, this all depends on OP's needs. I can easily imagine an application that needs heartbeets only once a day or even a week or month. I'm not a mobile developer, but assume a mobile developer wanting to know how many users, say, have installed his widget at the home screen. He can make all his applications send a heartbeet daily, and obviously there is no need for a more frequent requests. – Petr May 26 '15 at 10:07
-
@Petr- Then, a mobile developer won't need this kind of application and OP's intent was not to think of non-real time analysis as you can re-read the question. In your free time,please let me visualise that application ***(real-time)*** which will need heartbeat once a while(day/week,etc.). – Am_I_Helpful May 26 '15 at 10:16
-
"Each time a client sends a heartbeat, you change the value in the corresponding pair", but I want to know how to find this client in the queue within O(log(n)) ? The priority queue was already building with the Key last-heartbeet-time. – rainer rainer May 26 '15 at 10:22
-
@rainerrainer, I don't understand your comment. Do you want me to clarify how you do this change? – Petr May 26 '15 at 10:24
-
-
@Petr your second solution is just like a solution I have ever used (not heartbeat but another similar situation). It worked but if I have many threads to do updating, the lock logic will be a little messy (with my limited IQ = =). I don't event know if my codes were correct but worked. I want to find a better solution this time. – rainer rainer May 26 '15 at 10:39
-
@rainerrainer, I'm afraid you will anyway have to handle lock logic in case you want to use multiple threads. – Petr May 26 '15 at 11:04
I would suggest an approach slightly different from the first approach suggested by @Petr.
You would need a HashMap<client, FIFO<heartbeatDate>> clientHeartbeats
and a Heap<heartbeatDate, client> heartbeats
which is ordered in an ascending manner by the heartbeatDate
(the first element is the earliest heartbeat).
When a client sends a heartbeat you insert it in heartBeats
(time complexity = O(log(n))
) and in clientHeartbeats
(time complexity = O(1)
).
When the first heartbeat in heartbeats
is out of the given time interval you just remove it from the heap (time complexity = O(log(n))
) and remove the heartbeat from clientHeartbeats[removed heartbeat's client]
(time complexity = O(1)
). If the list becomes empty then that particular client has not sent a heartbeat in the indicated time interval.
Let me know if anything is unclear.

- 776
- 6
- 10