6

I am using ELK 6.8.9 all configuration is in my docker-compose file. It was working fine but when suddenly I am getting an error of

org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
or org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

This is how I configure ELK

   elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.9
    environment:
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    volumes:
      - /opt/services/data/elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    user: ${USER_ID}

on calling

curl -XGET 'localhost:9200/_cluster/health/balance_sheet?pretty'-- request
response -clear
{
  "cluster_name" : "docker-cluster",
  "status" : "red",
  "timed_out" : true,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
Er.Garvesh
  • 131
  • 1
  • 1
  • 12
  • 1
    do you care about the data or can you delete it? did you mount the volume? can you share how you run your Elasticsearch container? – ItayB Dec 16 '20 at 07:57
  • no I do not care about data right now I just want to configure my ELK in a right way and o I also mount the volumes in my docker compose file as you can see in image and the elastic container is running inside a docker container – Er.Garvesh Dec 16 '20 at 08:12

2 Answers2

1

Assuming you don't care about the data (as you said in the comments of the question - you can delete your /opt/services/data/elasticsearch folder in your local machine

Do that while your container is down and then run up again.

ItayB
  • 10,377
  • 9
  • 50
  • 77
  • but it will be a temporary solution what happen if I will care about the data is there any way we can solve this problem with out losing my data – Er.Garvesh Dec 16 '20 at 08:33
  • @Er.Garvesh right. First, let's see that it solve your current issue. I am not sure what you did but sounds like shards are corrupted. Did you take the container down in a middle of indexing or something? – ItayB Dec 16 '20 at 09:05
  • Not sure about middle on indexing but some times i restart my server I was doing some changes in log stash files and i also tried to delete the index and restart the Elasticsearch but for after some times again i am getting this error. whenever I delete the index and restart the Elasticsearch it works for some time but its not a fixed solution for me – Er.Garvesh Dec 16 '20 at 09:22
  • why are you restarting Elasticsearch after index deletion? – ItayB Dec 16 '20 at 09:35
  • https://stackoverflow.com/questions/21157466/all-shards-failed - Check this one – Er.Garvesh Dec 16 '20 at 09:40
  • the restart solution is from 2014. Look here: https://stackoverflow.com/a/55111098/1011253 did you set replication to 0? – ItayB Dec 16 '20 at 09:45
  • No i have not set Replication 0 , I am just using the default configuration { "cluster_name": "docker-cluster", "status": "yellow", "timed_out": false, "number_of_nodes": 1, "number_of_data_nodes": 1, "active_primary_shards": 277, "active_shards": 277, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 245, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 53.06513409961686 } – Er.Garvesh Dec 16 '20 at 10:00
  • 2
    so I guess this is your problem. Elasticsearch default replication factor is 1 which mean that you have two copied for each shard. Two same shards can't live on the same node and since you are running with only one node - the second shard because unassigned. You need to either change the default replication factor to 0 or add another Elasticsearch node – ItayB Dec 16 '20 at 10:58
  • Yes i am running for single node but default number of shards are 5 . if am running GET _cat/shards/fundapi-2020.12 Getting an output of API-2020.12 4 p STARTED 130 1.2mb 10.0.2.36 yCpzt6z API-2020.12 4 r UNASSIGNED API-2020.12 1 p STARTED 133 989.9kb 10.0.2.36 yCpzt6z API-2020.12 1 r UNASSIGNED API-2020.12 3 p STARTED 137 899.7kb 10.0.2.36 yCpzt6z API-2020.12 3 r UNASSIGNED API-2020.12 2 p STARTED 147 1mb 10.0.2.36 yCpzt API-2020.12 2 r UNASSIGNED API-2020.12 0 p STARTED 128 980.9kb 10.0.2.36 yCpzt6z API-2020.12 0 r UNASSIGNED – Er.Garvesh Dec 16 '20 at 11:16
  • @Er.Garvesh shards and replicas are two different things. The default number of shards are 5 indeed - and different shards can live in the same node. If shard has relicas (like your case, every shard has 1 replica) you have 10 shards total (5 primary and 5 replicas). Those replicas need another node or you need to set replica to 0. – ItayB Dec 16 '20 at 11:48
  • @Er.Garvesh any updates? did you try changing the replication factor? – ItayB Dec 17 '20 at 14:41
  • Yes , I did some changed set the replica to 0 now i have 5 shards only but after that I am facing a new exception in my logstash log Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down?"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::SocketException] – Er.Garvesh Dec 18 '20 at 07:32
  • @Er.Garvesh I'm happy to hear! I will appreciate if you accept my answer above if it helped you. You can open a new question with the new problem and I'll try to assist – ItayB Dec 18 '20 at 09:32
0

Steps to solve Elasticsearch 6.5.1 issue

  1. Disable read only indices:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'{ "index.blocks.read_only" : false } }'
  1. Restart non master nodes
  2. Review indices status by rest api:
curl http://10.146.64.42:9200/_cat/indices
  1. Review cluster health:
curl http://10.146.64.42:9200/_cluster/health

After a while, the elastic cluster will return to green health status.

  • could you please give a reason, what the meaning/origins of the error is. a solution is always good, but it is only half as usefull without an explanation of why it occured. – Flummiboy Oct 13 '22 at 09:48