72

I was working on elastic search and it was working perfectly. Today I just restarted my remote server (Ubuntu). Now I am searching in my indexes, it is giving me this error.

{"error":"SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed]","status":503}

I also checked the health. The status is red. Can anyone tell me what's the issue.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
user3176531
  • 763
  • 1
  • 5
  • 7

6 Answers6

56

It is possible on your restart some shards were not recovered, causing the cluster to stay red.
If you hit:
http://<yourhost>:9200/_cluster/health/?level=shards you can look for red shards.

I have had issues on restart where shards end up in a non recoverable state. My solution was to simply delete that index completely. That is not an ideal solution for everyone.

It is also nice to visualize issues like this with a plugin like:
Elasticsearch Head

mconlin
  • 8,169
  • 5
  • 31
  • 37
43

If you're running a single node cluster for some reason, you might simply need to do avoid replicas, like this:

curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 0
    }
}'

Doing this you'll force to use es without replicas

Paulo Victor
  • 3,814
  • 2
  • 26
  • 29
5

first thing first, all shards failed exception is not as dramatic as it sounds, it means shards were failed while serving a request(query or index), and there could be multiple reasons for it like

  1. Shards are actually in non-recoverable state, if your cluster and index state are in Yellow and RED, then it is one of the reasons.
  2. Due to some shard recovery happening in background, shards didn't respond.
  3. Due to bad syntax of your query, ES responds in all shards failed.

In order to fix the issue, you need to filter it in one of the above category and based on that appropriate fix is required.

The one mentioned in the question, is clearly in the first bucket as cluster health is RED, means one or more primary shards are missing, and my this SO answer will help you fix RED cluster issue, which will fix the all shards exception in this case.

danronmoon
  • 3,814
  • 5
  • 34
  • 56
Amit
  • 30,756
  • 6
  • 57
  • 88
3

If you encounter this apparent index corruption in a running system, you can work around it by deleting all files called segments.gen. It is advisory only, and Lucene can recover correctly without it.

From ElasticSearch Blog

LhasaDad
  • 1,786
  • 1
  • 12
  • 19
chemark
  • 1,181
  • 1
  • 13
  • 19
  • 2
    The current link is redirecting to the main elastic.co page. It no longer shows the blog entry. Edit submitted. – LhasaDad Dec 31 '20 at 03:39
0

For Elasticsearch > 5.0 it's possible to get some more information from this endpoint:

http://localhost:9200/_cluster/allocation/explain?pretty

I just ran into a case where I hit the virtual disk limit configured in Docker Desktop and adding an additional, unrelated container caused ES to fail.

Roopendra
  • 7,674
  • 16
  • 65
  • 92
Alex
  • 2,398
  • 1
  • 16
  • 30
0

If you are upgrading the Elasticsearch and have multiple versions you can face this issue. Continue to upgrade ALL nodes. And run the daemon reload.

sudo systemctl daemon-reload

Musab Dogan
  • 1,811
  • 1
  • 6
  • 8