i've ran into the issue with unassigned searchguard shards when I've added new nodes to the ElasticSearch cluster. Cluster is located in public-cloud and has enabled awareness setting with node.awareness.attributes: availability_zone. Searchguard has enabled replica count auto-expand enabled by default. Problem reoccurs when I have three nodes in one zone and by one in two other zones:
- eu-central-1a = 3 nodes
- eu-central-1b = 1 node
- eu-central-1c = 1 node
I do understand this is cluster configuration is kinda imbalanced, this is just replay of production issue. I just want to understand the logic of elasticsearch and searchguard. Why it is causing such issue. So here is my config
{
"cluster_name" : "test-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 1032,
"active_shards" : 3096,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.96771068776235
}
indices
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open searchguard GuL6pHCUTUKbmygbIsLAYw 1 4 5 0 131.3kb 35.6kb
explanation
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[searchguard][0], node[a59ptCI2SfifBWmnmRoqxA], [R], s[STARTED], a[id=d3rMAN8xQi2xrTD3y_SUPA]]"
},
{
"decider" : "awareness",
"decision" : "NO",
"explanation" : "there are too many copies of the shard allocated to nodes with attribute [aws_availability_zone], there are [5] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [3] to be less than or equal to the upper bound of the required number of shards per attribute [2]"
}
]
searchguard config
{
"searchguard" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-all",
"provided_name" : "searchguard",
"creation_date" : "1554095156112",
"number_of_replicas" : "4",
"uuid" : "GuL6pHCUTUKbmygbIsLAYw",
"version" : {
"created" : "6020499"
}
}
}
}
}
questions I have:
- searchguard config said
"number_of_replicas" : "4",
but allocator explanations saidthere are [5] total configured shard copies
so 5 is this with primary replica? Even if so... - what is the problem to put all these shards(3) to one zone (eu-central-1a) even if zone collapsed we would have two replicas in other zones, isn't it enough to recover?
- how elasticsearch calculates these conditionals
required number of shards per attribute [2]
. Considering this limitation I can raise only up to 2*zones_count (2*3 = 6) for my cluster. This is really not much. Looks like there should be ways to overcome this limit.