8

I'm trying to index a containerized Elasticsearch db using the Python client https://github.com/elastic/elasticsearch-py called from a script (running in a container too).

By looking at existing pieces of code, it seems that docker-compose is a useful tool to use for my purpose. My dir structure is

docker-compose.yml
indexer/
- Dockerfile
- indexer.py
- requirements.txt
elasticsearch/
- Dockerfile

My docker-compose.yml reads

version: '3'

services:
  elasticsearch:
    build: elasticsearch/
    ports: 
      - 9200:9200
    networks:
      - deploy_network
    container_name: elasticsearch

  indexer:
    build: indexer/
    depends_on:
      - elasticsearch
    networks:
      - deploy_network
    container_name: indexer
  
networks:
  deploy_network:
    driver: bridge

indexer.py reads

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
    
es = Elasticsearch(hosts=[{"host":'elasticsearch'}]) # what should I put here?

actions = [
    {
    '_index' : 'test',
    '_type' : 'content',
    '_id' : str(item['id']),
    '_source' : item,
    }
for item in [{'id': 1, 'foo': 'bar'}, {'id': 2, 'foo': 'spam'}]
]
    
# create index
print("Indexing Elasticsearch db... (please hold on)")
bulk(es, actions)
print("...done indexing :-)")

The Dockerfile for the elasticsearch service is

FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.1.3
EXPOSE 9200
EXPOSE 9300

and that for the indexer is

FROM python:3.6-slim
WORKDIR /app
ADD . /app
RUN pip install -r requirements.txt
ENTRYPOINT [ "python" ]
CMD [ "indexer.py" ]

with requirements.txt containing only elasticsearch to be downloaded with pip.

Running with docker-compose run indexer gives me the error message at https://pastebin.com/6U8maxGX (ConnectionRefusedError: [Errno 111] Connection refused). elasticsearch is up as far as I can see with curl -XGET 'http://localhost:9200/' or by running docker ps -a.

How can I modify my docker-compose.yml or indexer.py to solve the problem?

P.S. A (working) version (informed by the answers below) of the code can be found here, for completeness' sake: https://github.com/davidefiocco/dockerized-elasticsearch-indexer.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72

3 Answers3

12

The issue is a synchronisation bug: elasticsearch hasn't fully started when indexer tries to connect to it. You'll have to add some retry logic which makes sure that elasticsearch is up and running before you try to run queries against it. Something like running es.ping() in a loop until it succeeds with an exponential backoff should do the trick.

UPDATE: The Docker HEALTHCHECK instruction can be used to achieve a similar result (i.e. make sure that elasticsearch is up and running before trying to run queries against it).

Mihai Todor
  • 8,014
  • 9
  • 49
  • 86
  • 2
    OK! Following your suggestion, adding `import time` and `time.sleep(12)` in my `indexer.py` did it. Thanks Mihai! The code at https://github.com/davidefiocco/dockerized-elasticsearch-indexer is working for me. – Davide Fiocco Feb 10 '18 at 14:34
  • 1
    Also, for reference, this can be useful: https://docs.docker.com/compose/startup-order/ – Davide Fiocco Feb 10 '18 at 16:02
2

Making more explicit @Mihai_Todor update, we could use HEALTHCHECK (docker 1.12+), for instance with a command like:

curl -fsSL "http://$(hostname --ip-address):9200/_cat/health?h=status" | grep -E '^green'

To answer this question using using HEALTHCHECK:

FROM python:3.6-slim

WORKDIR /app
ADD . /app
RUN pip install -r requirements.txt

HEALTHCHECK CMD curl -fsSL "http://$(hostname --ip-address):9200/_cat/health?h=status" | grep -E '^green'

ENTRYPOINT [ "python" ]
CMD [ "indexer.py" ]
Filippo Vitale
  • 7,597
  • 3
  • 58
  • 64
0

I use retry to make sure Elasticsearch is ready to accept connections:

from retrying import retry

client = Elasticsearch()


class IndexerService:

    @staticmethod
    @retry(wait_exponential_multiplier=500, wait_exponential_max=100000)
    def init():
        MyDocumentIndex.init()

# Here we will wait until ES is ready, or 100 sec passed.
IndexerService.init()

It tries in 500 ms, 1 sec, 2 sec, 4 sec until 100 sec.

Reference: https://github.com/rholder/retrying

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Vladimir Obrizan
  • 2,538
  • 2
  • 18
  • 36