3

I have MySQL DB with one table without joins with ~100 millions of rows. These items can be updated or removed and also new records are saved into MySQL DB periodically e.g. every minute. How can I reach updating indexes in Elasticsearch

when data is updated/changed/deleted in MySQL DB?

Actualizing data is must have for me it’s very important to stay up to date with MySQL. Is it possible? Thanks.

Also I have tried the solution with schedule => "* * * * *" and updated_at > :sql_last_value but it is very slow.

Rammgarot
  • 1,467
  • 14
  • 9
  • this answer might help: https://stackoverflow.com/questions/34477095/elasticsearch-replication-of-other-system-data/34477639#34477639 – Val Sep 07 '17 at 11:48
  • With tables of the size you mention, it's helpful to put time frames around your specification of "up to date." Can you specify this, for example? ElasticSearch must rarely be more than 60 seconds behind MySQL, and never more than 300 seconds behind? If you have a spec like that you can implement and test it. – O. Jones Sep 07 '17 at 12:06
  • @O.Jones under "up to date" I mean as fast as possible, tending to zero. At once about 50k rows can be updated. I want in ideally real-time but I understand that it is not possible. I want keep my data in Elastic as relevant as possible. I tried the solution with `updated_at > :sql_last_value` but it is very slow even having indexes in MySQL by updated_at. – Rammgarot Sep 07 '17 at 13:43
  • @Val thanks for the reply, I'll check Logstash JDBC input – Rammgarot Sep 07 '17 at 13:48

2 Answers2

3

Not sure what do you mean by "very slow" , but if it's about frequency of sampling , then the expression : schedule => "/2 * * * * *" for instance, will execute the query you have defined in statement variable in the .config file every two seconds.

Nir Duan
  • 6,164
  • 4
  • 24
  • 38
2

Thanks for the reply to all, I've found the solution for me thanks to Aaron Mildenstein who helped me on discuss.elastic.co. See the thread here.

The answer:

In order to achieve this, you will need a separate process that monitors binlog transactions and initiates updates to the Elasticsearch end of things. Elastic does not currently offer any tools that do this, so you’ll have to find one that someone open sourced, or write your own.

To parse binlogs I used this lib: https://github.com/siddontang/go-mysql-elasticsearch

Rammgarot
  • 1,467
  • 14
  • 9