I was wondering on how often does Worker pings Master to check on Master's liveness? Or is it the Master (Resource manager) that pings Workers to check on their liveness and if any workers are dead to spawn ? Or is it both?
Some info: Standalone cluster 1 Master - 8core 12Gb 32 workers - each 8 core and 8 Gb
My main problem - Here's what happened:
Master M - running with 32 workers Worker 1 and 2 died at 03:55:00 - so now the cluster is 30 workers
Worker 1' came up at 03:55:12.000 AM - it connected to M Worker 2' came up at 03:55:16.000 AM - it connected to M
Master M dies at 03:56.00 AM New master NM' comes up at 03:56:30 AM Worker 1' and 2' - DO NOT connect to NM Remaining 30 workers connect to NM.
So NM now has 30 workers.
I was wondering on why those two won't connect to new master NM even though master M is dead for sure.
PS:I have a LB setup for Master which means that whenever a new master comes in LB will start pointing to new one.