1

i've ran into the issue with unassigned searchguard shards when I've added new nodes to the ElasticSearch cluster. Cluster is located in public-cloud and has enabled awareness setting with node.awareness.attributes: availability_zone. Searchguard has enabled replica count auto-expand enabled by default. Problem reoccurs when I have three nodes in one zone and by one in two other zones:

  • eu-central-1a = 3 nodes
  • eu-central-1b = 1 node
  • eu-central-1c = 1 node

I do understand this is cluster configuration is kinda imbalanced, this is just replay of production issue. I just want to understand the logic of elasticsearch and searchguard. Why it is causing such issue. So here is my config

{
  "cluster_name" : "test-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 5,
  "active_primary_shards" : 1032,
  "active_shards" : 3096,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 99.96771068776235
}

indices

health status index                                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   searchguard                           GuL6pHCUTUKbmygbIsLAYw   1   4          5            0    131.3kb         35.6kb

explanation

"deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[searchguard][0], node[a59ptCI2SfifBWmnmRoqxA], [R], s[STARTED], a[id=d3rMAN8xQi2xrTD3y_SUPA]]"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [aws_availability_zone], there are [5] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [3] to be less than or equal to the upper bound of the required number of shards per attribute [2]"
        }
      ]

searchguard config

{
  "searchguard" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-all",
        "provided_name" : "searchguard",
        "creation_date" : "1554095156112",
        "number_of_replicas" : "4",
        "uuid" : "GuL6pHCUTUKbmygbIsLAYw",
        "version" : {
          "created" : "6020499"
        }
      }
    }
  }
}

questions I have:

  • searchguard config said "number_of_replicas" : "4", but allocator explanations said there are [5] total configured shard copies so 5 is this with primary replica? Even if so...
  • what is the problem to put all these shards(3) to one zone (eu-central-1a) even if zone collapsed we would have two replicas in other zones, isn't it enough to recover?
  • how elasticsearch calculates these conditionals required number of shards per attribute [2]. Considering this limitation I can raise only up to 2*zones_count (2*3 = 6) for my cluster. This is really not much. Looks like there should be ways to overcome this limit.
srgbnd
  • 5,404
  • 9
  • 44
  • 80
bulnv
  • 21
  • 3

0 Answers0