We need to denormalize 2 million records from our MySQL database to ElasticSearch. Our devops guy setup ElasticSearch on AWS. I wrote a Clojure app that grabbed the data out of MySQL, aggregated it into the format we wanted, and then did a put to ElasticSearch. I set this up on our EC2 instance, the devops guy set the AWS roles correctly, then I started the app running. After 10 minutes I did this:
curl --verbose -d '{"query": { "match_all": {} }}' -H 'Content-Type: application/json' -X GET "https://search-samedayes01-ntt7r7b7sfhy3wu.us-east-1.es.amazonaws.com/facts-over-time/_search"
and I saw:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":14952,"max_score":1.0,"hits": [...
Awesome! It's working! I looked at some of the documents and they looked good.
Another 15 minutes goes by and I run the same query as above. Sad to say, I get the same result:
{"took":1,"timed_out":false,"_shards": {"total":5,"successful":5,"failed":0},"hits": {"total":14952,"max_score":1.0,"hits": [...
I was like, what? Why would it accept 14,952 records and then stop?
My Clojure function is set to throw an error if there are any problems;
(defn push-item-to-persistence
[item db]
(let [
denormalized-id (get-in item [:denormalized-id] :no-id)
item (assoc item :updated-at (temporal/current-time-as-datetime))
item (assoc item :permanent-holding-id-for-item-instances (java.util.UUID/randomUUID))
item (assoc item :instance-id-for-this-one-item (java.util.UUID/randomUUID))
item (assoc item :item-type :deduplication)
]
(if (= denormalized-id :no-id)
(slingshot/throw+ {
:type ::no-denormalized-id-in-push-item-into-database
:item item
})
(slingshot/try+
(esd/put db "facts-over-time" "deduplicaton" (str denormalized-id) item)
(println "We just put a document in ES.")
(catch Object o
(slingshot/throw+ {
:type ::push-item-to-persistence
:error o
:item item
:db db
}
))))))
If I look at the logs, there are no errors, and I keep seeing this line printed out:
We just put a document in ES.
Now it's been over an hour, and it seems we are still stuck at 14,952 documents.
What might have gone wrong? And why don't I see an error?
Using Elastisch as the library to connect Clojure to AWS ES.
Update
Okay, now at least I see these Exceptions. I'm not clear where they are being caught. Everywhere in my code I rethrow Exceptions because I want the app to die on the first Exception. These are being caught somewhere, possibly in the Elastish library that I'm using? Or I accidentally catch and log somewhere.
But that is a somewhat trivial question. More important:
The next question would be why I'm getting these Exceptions. Where do I adjust AWS ElasticSearch so it accepts our writes at a reasonable speed.
Oct 04, 2017 6:53:44 PM org.apache.http.impl.client.DefaultHttpClient tryConnect
INFO: I/O exception (java.net.SocketException) caught when connecting to {s}->https://search-samedayes01-ntsdht7sfhy3wu.us-east-1.es.amazonaws.com:443: Broken pipe (Write failed)
Oct 04, 2017 7:09:06 PM org.apache.http.impl.client.DefaultHttpClient tryConnect
INFO: Retrying connect to {s}->https://search-samedayes01-ntsdht7sfhy3wu.us-east-1.es.amazonaws.com:443
Oct 04, 2017 6:54:13 PM org.apache.http.impl.client.DefaultHttpClient tryConnect
INFO: I/O exception (java.net.SocketException) caught when connecting to {s}->https://search-samedayes01-ntsdht7sfhy3wu.us-east-1.es.amazonaws.com:443: Broken pipe (Write failed)
Oct 04, 2017 7:09:09 PM org.apache.http.impl.client.DefaultHttpClient tryConnect
INFO: Retrying connect to {s}->https://search-samedayes01-ntsdht7sfhy3wu.us-east-1.es.amazonaws.com:443
Update 2
I started over again. About 920 documents were put to ElasticSearch successfully. And then I got:
:hostname "UnknownHost"
:type java.io.EOFException
:message "SSL peer shut down incorrectly"
What?
Also, the writes seem crazy slow. Perhaps 10 operations per second. There must be something in AWS that I can adjust that will make our ElasticSearch nodes accept more writes? I'd like to get at least 1,000 writes a second.
Update 3
So now I got it to the point where this app mostly works, but it works in the oddest way I can imagine.
I was getting a "broken pipe" message, which lead me here:
SSL peer shut down incorrectly in Java
Following that advice I did this:
(System/setProperty "https.protocols" "TLSv1.1")
Which seemed to have no effect.
But now my app does this:
- Moves at a glacial speed, making perhaps 1 write to ElasticSearch per second.
- Throws the "broken pipe" Exception.
- Takes off like a rocket and starts writing about 15,000 requests to ElasticSearch per minute.
I'm glad it's finally working, but I'm uncomfortable with the fact that I have no idea why it is working.
Also, 15,000 requests per minute is not actually that fast. When moving 2 million documents, this takes more than 2 hours, which is terrible. However, Amazon only supports the REST interface to ElasticSearch. I've read the native protocol would be about 8 times faster. That sounds like what we need.