1

Aim: sync elasticsearch with postgres database
Why: sometimes newtwork or cluster/server break so future updates should be recorded

This article https://qafoo.com/blog/086_how_to_synchronize_a_database_with_elastic_search.html suggests that I should create a separate table updates that will sync elasticsearch's id, allowing to select new data (from database) since the last record (in elasticsearch). So I thought what if I could record elasticsearch's failure and successful connection: if client ponged back successfully (returned a promise), I could launch a function to sync records with my database.

Here's my elasticConnect.js

import elasticsearch from 'elasticsearch'
import syncProcess from './sync'

const client = new elasticsearch.Client({
  host:  'localhost:9200',
  log: 'trace'
});


client.ping({
   requestTimeout: Infinity,
   hello: "elasticsearch!"
})
.then(() => syncProcess) // successful connection 
.catch(err => console.error(err))


 export default client

This way, I don't even need to worry about running cron job (if question 1 is correct), since I know that cluster is running.

Questions

  1. Will syncProcess run before export default client? I don't want any requests coming in while syncing...

  2. syncProcess should run only once (since it's cached/not exported), no matter how many times I import elasticConnect.js. Correct?

  3. Is there any advantages using the method with updates table, instead of just selecting data from parent/source table?

  4. The articles' comments say "don't use timestamp to compare new data!".Ehhh... why? It should be ok since database is blocking, right?

Antartica
  • 125
  • 9

1 Answers1

2

For 1: As it is you have not warranty that syncProcess will have run by the time the client is exported. Instead you should do something like in this answer and export a promise instead.

For 2: With the solution I linked to in the above question, this would be taken care of.

For 3: An updates table would also catch record deletions, while simply selecting from the DB would not, since you don't know which records have disappeared.

For 4: The second comment after the article you linked to provides the answer (hint: timestamps are not strictly monotonic).

Community
  • 1
  • 1
Val
  • 207,596
  • 13
  • 358
  • 360
  • Question: in the article they only mentioned "in order to sync, run cron job". But this can't be right, since last_squence_id from elasticsearch will be changed, therefore missing out old updates. So I need to make sure that I sync before inserting data into elasticsearch, correct? – Antartica Oct 03 '16 at 19:01