4

I want to set up postgres and elasticsearch. But before throwing data into elasticsearch, I want to prevent data loss when network or server goes down. After reading on this topic: https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/. I came up with 3 solutions.

  1. Create a database table ei: store, and add any new/updated data to it.

    • During queries: insert data into store.
    • Select new data: SELECT data FROM store WHERE modified > (:last modified time from elasticsearch)
    • Send "new" data over to elasticsearch
  2. Use redis to pub/sub requests, and make elasticsearch listen/subscribe for upcoming data. If elasticsearch breaks, the data will be in the queue

  3. Catch any errors during transaction to elasticsearch and save data into a safe place (ei: store table mentioned above). Then have a cron job pushing this data back.


Of course the easiest thing would be to insert data to elasticsearch straight away. But doing so prevents data to be stored in a safe place during corruptions. 1 is too slow in my opinion, unlike 2. And 3 requires mantaining error handling code.

For now 2 is my option.


Are there better ways to do this? I'd like to hear your opinions and new suggestions

:D

Antartica
  • 125
  • 9

1 Answers1

0

Redis (2) isn't reliable.

What I decided to do add data to elasticsearch straight away and add data to updates table. Then run a sync() function straight after connecting to elasticsearch client (if cluster went down before) + run a cron job every 24 hours to launch sync(). All sync() does is selects newest data (time or id) from updates A and elasticsearch B and compares if there are records A > B. If so, insert data using bulk API.

Hope this helps :)

And I am still opened to suggestions and fedback...

Antartica
  • 125
  • 9