We have ElasticSearch (1.5) on AWS (t2.micro, 2 instances with 10GB SSD storage each) and a MySQL with ~450K fairly big/complex entities.
I'm using python to read from MySql, serialize to JSON and PUT to ElasticSearch. There are 10 threads working simultaneously, each PUTing bulk of 1000 documents at the time.
Total of ~450K (1.3GB) documents, it takes around 20min to process and send to ElasticSearch.
Problem is that only around 85% of them get indexed and rest are lost. When I reduce number of documents to ~100K they all get indexed.
Looking at ElasticSearch AWS monitor I can see CPU getting up to 100% while indexing, but it doesnt give any errors.
What is the best way to find out bottle here? I want it fast but cant afford losing any documents.
EDIT. I've run it again checking output of /_cat/thread_pool?v every few minutes. Indexed 390805 out of 441400. Out of thread_pool bellow:
host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
<host> x.x.x.x 1 22 84 0 0 0 0 0 0
<host> x.x.x.x 1 11 84 0 0 0 0 0 0
<host> x.x.x.x 1 29 84 0 0 0 0 0 0
<host> x.x.x.x 1 13 84 0 0 0 0 0 0
<host> x.x.x.x 0 0 84 0 0 0 0 0 0
<host> x.x.x.x 1 17 84 0 0 0 0 0 0
<host> x.x.x.x 0 0 84 0 0 0 0 0 0
EDIT 2
host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
<host> x.x.x.x 0 0 84 0 0 0 0 0 0
EDIT 3
$ curl https://xxxxx.es.amazonaws.com/_cat/thread_pool?v&h=id,host,ba,bs,bq,bqs,br,bl,bc,bmi,bma
[1] 15896
host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
<host> x.x.x.x 0 0 84 0 0 0 0 0 0
^^ copy/paste of what I'm getting back
EDIT 4
$ curl 'https://xxxxx.es.amazonaws.com/_cat/thread_pool?v&h=id,host,ba,bs,bq,bqs,br,bl,bc,bmi,bma'
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
still nothing
EDIT 5
id host ba bs bq bqs br bmi bma bl br bc
n6Ad <host> 0 1 0 50 84 1 1 1 84 25821
some mysterious way it worked when I changed order of params