0

I'm new to ES and got a little confused about how to work with mysql & ES.

So, I want to use ES DB & mysql DB - so the ES is always a copy of my mysql DB in order to perform search and get data faster.

I got an idea to save, update and delete data in ES after the SQL command did well. But, then I heard about JDBC river - and I don't know which approach is better.

I think the first way I mentioned is better, but there are too many opinions around the web so, I don't know which are relevant.

Snow
  • 11
  • 1
  • 5
  • 1
    This answer might help: https://stackoverflow.com/questions/34477095/elasticsearch-replication-of-other-system-data/34477639#34477639 – Val Jun 21 '17 at 12:36

1 Answers1

0

It's completely depend your use case. If data insertion ratio is less then you can go with the first approach ( after successfully insertion data into RDBMS trigger same data into Elasticsearch ). If your data index ratio is very high then it would be better to index data in MySQL first and after that poll data every 1min or 5min from MySQL using Logstash Input JDBC plugin. The advantage of the second approach is, it will very less chances to create any network overhead in your server.

I am using Logstash Input jdbc plugin in my use case. For e.g.

input {
  jdbc {
    jdbc_driver_library => "/home/roop/logstash/mysql-connector-java-5.1.36.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/ecommerce"
    jdbc_user => "roopendra"
    jdbc_password => "roopendra"
    parameters => { "product_category" => "electronics" }
    schedule => "* * * * *"
    statement => "select * from products where category = :product_category"
  }
}
output {
    elasticsearch {
        index => "products"
        document_type => "product"
        document_id => "%{unique_id_field_name}"
        hosts => ["localhost:9200"]
    }
}

For Reference

Roopendra
  • 7,674
  • 16
  • 65
  • 92